llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	5d567dc137	AMDGPU: Enable function calls by default Fixes some crashes on illegal call situations which are unfortunately still valid IR. llvm-svn: 355051	2019-02-28 00:40:32 +00:00
Abderrazek Zaafrani	2fc498a652	[AArch64] Generate FP16 vector compare instructions. https://reviews.llvm.org/D58561 llvm-svn: 355050	2019-02-28 00:31:38 +00:00
Matt Arsenault	aa03bcd23c	AMDGPU: Fix crashes in invalid call cases We have to at least tolerate calls to kernels, possibly with a mismatched calling convention on the callsite. llvm-svn: 355049	2019-02-28 00:28:44 +00:00
Matt Arsenault	d3093c2f1f	GlobalISel: Implement fewerElementsVector for phi llvm-svn: 355048	2019-02-28 00:16:32 +00:00
Matt Arsenault	72bcf15dbf	GlobalISel: Implement moreElementsVector for phi llvm-svn: 355047	2019-02-28 00:01:05 +00:00
Reid Kleckner	4fb3502bc9	[InstrProf] Use separate comdat group for data and counters Summary: I hadn't realized that instrumentation runs before inlining, so we can't use the function as the comdat group. Doing so can create relocations against discarded sections when references to discarded __profc_ variables are inlined into functions outside the function's comdat group. In the future, perhaps we should consider standardizing the comdat group names that ELF and COFF use. It will save object file size, since __profv_$sym won't appear in the symbol table again. Reviewers: xur, vsk Subscribers: eraman, hiraditya, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D58737 llvm-svn: 355044	2019-02-27 23:38:44 +00:00
Alina Sbirlea	fcfa7c5f92	[MemorySSA] Make insertDef insert corresponding phi nodes. Summary: The original assumption for the insertDef method was that it would not materialize Defs out of no-where, hence it will not insert phis needed after inserting a Def. However, when cloning an instruction (use case used in LICM), we do materialize Defs "out of no-where". If the block receiving a Def has at least one other Def, then no processing is needed. If the block just received its first Def, we must check where Phi placement is needed. The only new usage of insertDef is in LICM, hence the trigger for the bug. But the original goal of the method also fails to apply for the move() method. If we move a Def from the entry point of a diamond to either the left or right blocks, then the merge block must add a phi. While this usecase does not currently occur, or may be viewed as an incorrect transformation, MSSA must behave corectly given the scenario. Resolves PR40749 and PR40754. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58652 llvm-svn: 355040	2019-02-27 22:20:22 +00:00
Joerg Sonnenberger	6a198366a0	Default to Secure PLT on PPC for NetBSD and OpenBSD. This matches the default settings of clang. llvm-svn: 355038	2019-02-27 21:53:14 +00:00
James Y Knight	f33b1f49b7	Fixup compilation/test failures after r354960 and r355013. llvm-svn: 355034	2019-02-27 21:47:35 +00:00
Matt Davis	1d5c23523e	[llvm-cxxfilt] Re-enable split and demangle stdin input on certain non-alphanumerics. This restores the patch that splits demangled stdin input on non-alphanumerics. I had reverted this patch earlier because it broke Windows build-bots. I have updated the test so that it passes on Windows. I was running the test from powershell and never saw the issue until I switched to the mingw shell. This reverts commit `628ab5c682`. llvm-svn: 355031	2019-02-27 21:39:11 +00:00
Evgeniy Stepanov	f46a52b536	[hwasan, asan] Intercept vfork. Summary: Intercept vfork on arm, aarch64, i386 and x86_64. Reviewers: pcc, vitalybuka Subscribers: kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, #sanitizers, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D58533 llvm-svn: 355030	2019-02-27 21:11:50 +00:00
Philip Reames	288a95fc8c	Seperate volatility and atomicity/ordering in SelectionDAG At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..). This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success. To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll).. Previously landed D57593 Fix a bug in the definition of isUnordered on MachineMemOperand D57596 [CodeGen] Be conservative about atomic accesses as for volatile D57802 Be conservative about unordered accesses for the moment rL353959: [Tests] First batch of cornercase tests for unordered atomics. rL353966: [Tests] RMW folding tests w/unordered atomic operations. rL353972: [Tests] More unordered atomic lowering tests. rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface rL354740: [Hexagon, SystemZ] Be super conservative about atomics rL354800: [Lanai] Be super conservative about atomics rL354845: [ARM] Be super conservative about atomics Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.) Differential Revision: https://reviews.llvm.org/D57601 llvm-svn: 355025	2019-02-27 20:20:08 +00:00
Rong Xu	ac552f77f4	Fixed ubsan failures in r355005. llvm-svn: 355023	2019-02-27 20:01:14 +00:00
Matt Davis	628ab5c682	Revert "[llvm-cxxfilt] Split and demangle stdin input on certain non-alphanumerics." This reverts commit `5cd5f8f256`. The test passes on linux, but fails on the windows build-bots. This test failure seems to be a quoting issue between my test and FileCheck on Windows. I'm reverting this patch until I can replicate and fix in my Windows environment. llvm-svn: 355021	2019-02-27 19:52:02 +00:00
Sanjay Patel	ac96a92d82	[InstCombine] add tests for add+ext+add; NFC llvm-svn: 355020	2019-02-27 19:27:45 +00:00
Simon Pilgrim	1001a6ab03	[X86][AVX] Pull out some INSERT_SUBVECTOR combines into a combineConcatVectorOps helper. NFCI A lot of the INSERT_SUBVECTOR combines can be more generally handled as if they have come from a CONCAT_VECTORS node. I've been investigating adding a CONCAT_VECTORS combine to X86, but this is a much easier first step that avoids the issue of handling a number of pre-legalization issues that I've encountered. Differential Revision: https://reviews.llvm.org/D58583 llvm-svn: 355015	2019-02-27 18:46:32 +00:00
Matt Davis	7a24dbdfd3	[llvm-readobj] Print section type values for unknown sections. Summary: This patch displays a hexadecimal section value (Elf_Shdr::sh_type) or section-relative offset when printing unknown sections. Here is a subset of the output (ignoring the fields following "Type" when dumping an ELF's GNU `--section-headers` table). Section Headers: ``` [Nr] Name Type [16] android_rel LOOS+0x1 [17] android_rela LOOS+0x2 [27] unknown 0x1000: <unknown> [28] loos LOOS+0 [30] hios VERSYM [31] loproc LOPROC+0 [33] hiproc LOPROC+0xFFFFFFF [34] louser LOUSER+0 [36] hiuser LOUSER+0x7FFFFFFF ``` As a comparison, the previous output looked something like the above, but with a blank "Type" field: ``` [Nr] Name Type [27] unknown [28] loos [30] hios VERSYM [31] loproc [33] hiproc [34] louser [36] hiuser ``` This fixes PR40773 Reviewers: jhenderson, rupprecht, Bigcheese Reviewed By: jhenderson, rupprecht, Bigcheese Subscribers: MaskRay, Bigcheese, srhines, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58701 llvm-svn: 355014	2019-02-27 18:39:17 +00:00
Alexey Lapshin	d89d638055	Attempt to fix buildbot after r354972 [#1 ]. NFCI. llvm-svn: 355013	2019-02-27 18:36:46 +00:00
Matt Davis	69bec61998	[llvm-cxxfilt] Re-enable the delimiters test on Windows. The original intent was to enable this test for Windows; however, that initial patch broke one of the build-bots. I temporarily disabled this test on Windows until that issue was resolved. It was resolved in my previous patch (cfd1d9742ee2d1b8dd6b7), and now I am re-enabling this test. llvm-svn: 355011	2019-02-27 18:04:21 +00:00
Matt Davis	eaa895368b	Clean up the delimiters test. Ideally this is a NFCI, used single quotes in most cases. Hopefully this will make the Windows bot happy. I've marked this unsupported on windows, until I get my windows box setup with this patch to test. I'll remove that constraint after I'm confident this will pass on windows. I just want to silence the buildbots for now. llvm-svn: 355007	2019-02-27 17:39:36 +00:00
Rong Xu	6cdf3d8086	Recommit r354930 "[PGO] Context sensitive PGO (part 1)" Fixed UBSan failures. llvm-svn: 355005	2019-02-27 17:24:33 +00:00
James Henderson	416603e32a	[llvm-readobj]Add additional testing for various ELF features This patch adds testing of areas of the code that are not fully tested, in particular dynamic table printing, ELF type printing, handling of edge cases where things are missing/empty (relocations/program header tables/section header table), and the --string-dump switch. Reviewed by: grimar, higuoxing, rupprecht Differential Revision: https://reviews.llvm.org/D58677 llvm-svn: 355003	2019-02-27 16:41:59 +00:00
Xing GUO	d78164a8ab	[llvm-objdump] Should print strings when dumping DT_RPATH, DT_RUNPATH, DT_SONAME, DT_AUXILIARY and DT_FILTER tags in dynamic section. Summary: Before: ``` Dynamic Section: NEEDED libpthread.so.0 ... NEEDED ld-linux-x86-64.so.2 RPATH 0x00000000001c2e61 ``` After: ``` Dynamic Section: NEEDED libpthread.so.0 ... NEEDED ld-linux-x86-64.so.2 RPATH $ORIGIN/../lib ``` Only a small problem here, I have no idea on choosing test case. I see there's a test file(test/tools/llvm-objdump/private-headers-dynamic-section.test). But it has no DT_RPATH and DT_RUNPATH tags. Shall I replace the ELF file in the Inputs dir by a new one? Reviewers: jhenderson, grimar Reviewed By: jhenderson Subscribers: srhines, rupprecht, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58707 llvm-svn: 355001	2019-02-27 16:37:15 +00:00
Matt Davis	5cd5f8f256	[llvm-cxxfilt] Split and demangle stdin input on certain non-alphanumerics. Summary: This patch attempts to replicate GNU c++-filt behavior when splitting stdin input for demangling. Previously, cxx-filt would split input only on spaces. Each delimited item is then demangled. From what I have tested, GNU c++filt also splits input on any character that does not make up the mangled name (notably commas, but also a large set of non-alphanumeric characters). This patch splits stdin input on any character that does not belong to the Itanium mangling format (since Itanium is currently the only supported format in llvm-cxxfilt). This is an update to PR39990 Reviewers: jhenderson, tejohnson, compnerd Reviewed By: compnerd Subscribers: erik.pilkington, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58416 llvm-svn: 354998	2019-02-27 16:29:50 +00:00
Nikita Popov	a54fe15610	[InstCombine] Add additional add.sat overflow tests; NFC Baseline for D58593. llvm-svn: 354996	2019-02-27 16:18:29 +00:00
Sanjay Patel	e375074dff	[InstCombine] regenerate complete checks; NFC llvm-svn: 354993	2019-02-27 15:59:30 +00:00
Nico Weber	106db04a80	gn build: Merge r354989 llvm-svn: 354991	2019-02-27 15:46:51 +00:00
Nico Weber	bfdfa8d99c	gn build: Merge r354692 llvm-svn: 354987	2019-02-27 15:29:14 +00:00
Eugene Leviant	7f78d4712f	[DebugInfo] Apply subprogram attributes on behalf of owner CU When using full LTO it is possible that template function definition DIE is bound to one compilation unit and it's declaration to another. We should add function declaration attributes on behalf of its owner CU otherwise we may end up with malformed file identifier in function declaration DW_AT_decl_file attribute. Differential revision: https://reviews.llvm.org/D58538 llvm-svn: 354978	2019-02-27 14:46:59 +00:00
Dmitry Preobrazhensky	7904231edb	[AMDGPU][MC] Added register size check for VOP3/SDWA/DPP operands See bug 37943: https://bugs.llvm.org/show_bug.cgi?id=37943 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58287 llvm-svn: 354974	2019-02-27 13:58:48 +00:00
Alexey Lapshin	77fc1f6049	[DebugInfo] add SectionedAddress to DebugInfo interfaces. That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703 "wrong line number info for obj file compiled with -ffunction-sections" bug. The problem happened with only .o files. If object file contains several .text sections then line number information showed incorrectly. The reason for this is that DwarfLineTable could not detect section which corresponds to specified address(because address is the local to the section). And as the result it could not select proper sequence in the line table. The fix is to pass SectionIndex with the address. So that it would be possible to differentiate addresses from various sections. With this fix llvm-objdump shows correct line numbers for disassembled code. Differential review: https://reviews.llvm.org/D58194 llvm-svn: 354972	2019-02-27 13:17:36 +00:00
Dmitry Preobrazhensky	ef92035827	[AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 llvm-svn: 354969	2019-02-27 13:12:12 +00:00
George Rimar	79fb858053	[llvm-objcopy] - Check for invalidated relocations when removing a section. This is https://bugs.llvm.org/show_bug.cgi?id=40818 Removing a section that is used by relocation is an error we did not report. The patch fixes that. Differential revision: https://reviews.llvm.org/D58625 llvm-svn: 354962	2019-02-27 11:18:27 +00:00
Simon Pilgrim	71bb6850cf	[X86][AVX] Only combine loads to broadcasts for legal types Thanks to @echristo for spotting this. llvm-svn: 354961	2019-02-27 11:17:25 +00:00
James Henderson	5b27402bee	[llvm-readobj]Fix error messages for bad archive members and add testing for archive handling llvm-readobj's error messages were broken for bad archive members. This patch fixes them, and also adds testing for archive and thin archive handling within llvm-readobj. Reviewed by: rupprecht, grimar, higuoxing Differential Revision: https://reviews.llvm.org/D58681 llvm-svn: 354960	2019-02-27 11:07:08 +00:00
Simon Pilgrim	65706cf715	Fix Wenum-compare gcc7 warning. NFCI. llvm-svn: 354958	2019-02-27 10:19:53 +00:00
Fangrui Song	73f16996de	[llvm-readobj] Print DF_1_DISPRELPND The test will be added by D58677. llvm-svn: 354955	2019-02-27 05:37:11 +00:00
Yonghong Song	cc290a9e91	[BPF] Don't fail for static variables Currently, the LLVM will print an error like Unsupported relocation: try to compile with -O2 or above, or check your static variable usage if user defines more than one static variables in a single ELF section (e.g., .bss or .data). There is ongoing effort to support static and global variables in libbpf and kernel. This patch removed the assertion so user programs with static variables won't fail compilation. The static variable in-section offset is written to the "imm" field of the corresponding to-be-relocated bpf instruction. Below is an example to show how the application (e.g., libbpf) can relate variable to relocations. -bash-4.4$ cat g1.c static volatile long a = 2; static volatile int b = 3; int test() { return a + b; } -bash-4.4$ clang -target bpf -O2 -c g1.c -bash-4.4$ llvm-readelf -r g1.o Relocation section '.rel.text' at offset 0x158 contains 2 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000400000001 R_BPF_64_64 0000000000000000 .data 0000000000000018 0000000400000001 R_BPF_64_64 0000000000000000 .data -bash-4.4$ llvm-readelf -s g1.o Symbol table '.symtab' contains 6 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS g1.c 2: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 a 3: 0000000000000008 4 OBJECT LOCAL DEFAULT 4 b 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 5: 0000000000000000 64 FUNC GLOBAL DEFAULT 2 test -bash-4.4$ llvm-objdump -d g1.o g1.o: file format ELF64-BPF Disassembly of section .text: 0000000000000000 test: 0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 2: 79 11 00 00 00 00 00 00 r1 = (u64 )(r1 + 0) 3: 18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00 r2 = 8 ll 5: 61 20 00 00 00 00 00 00 r0 = (u32 )(r2 + 0) 6: 0f 10 00 00 00 00 00 00 r0 += r1 7: 95 00 00 00 00 00 00 00 exit -bash-4.4$ . from symbol table, static variable "a" is in section #4, offset 0. . from symbol table, static variable "b" is in section #4, offset 8. . the first relocation is against symbol #4: 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 and in-section offset 0 (see llvm-objdump result) . the second relocation is against symbol #4: 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 and in-section offset 8 (see llvm-objdump result) . therefore, the first relocation is for variable "a", and the second relocation is for variable "b". Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 354954	2019-02-27 05:36:15 +00:00
Vlad Tsyrklevich	c01643087e	Revert "[PGO] Context sensitive PGO (part 1)" This reverts commit r354930, it was causing UBSan failures. llvm-svn: 354953	2019-02-27 03:45:28 +00:00
Saleem Abdulrasool	b67342e7cb	Support: enable backtraces on Windows Some platforms, e.g. Windows, support backtraces but don't have BACKTRACE. Checking for BACKTRACE prevents Windows from having backtraces. Patch by Jason Mittertreiner! llvm-svn: 354951	2019-02-27 03:21:50 +00:00
Heejin Ahn	82da1ffc16	[WebAssembly] Fix ScopeTops info in CFGStackify for EH pads Summary: When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we should create not only `end_try` -> `try` mapping but also `catch` -> `try` mapping as well. If this is not created, `block` and `end_block` markers later added may span across an existing `catch`, resulting in the incorrect code like: ``` try block --\| (X) catch \| end_block --\| end_try ``` Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58605 llvm-svn: 354945	2019-02-27 01:35:14 +00:00
Jonas Devlieghere	bb111152b7	[DWARFFormValue] Cleanup DWARFFormValue interface. (NFC) DWARFFormValues can be created from a data extractor or by passing its value directly. Until now this was done by member functions that modified an existing object's internal state. This patch replaces a subset of these methods with static method that return a new DWARFFormValue. llvm-svn: 354941	2019-02-27 00:58:09 +00:00
Heejin Ahn	cf699b4534	[WebAssembly] Remove unnecessary instructions after TRY marker placement Summary: This removes unnecessary instructions after TRY marker placement. There are two cases: - `end`/`end_block` can be removed if they overlap with `try`/`end_try` and they have the same return types. - `br` right before `catch` that branches to after `end_try` can be deleted. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58591 llvm-svn: 354939	2019-02-27 00:50:53 +00:00
Jonas Paulsson	129826cd9f	[SystemZ] Pass regalloc hints to help Load-and-Test transformations. Since there is no "Load-and-Test-High" instruction, the 32 bit load of a register to be compared with 0 can only be implemented with LT if the virtual GRX32 register ends up in a low part (GR32 register). This patch detects these cases and passes the GR32 registers (low parts) as (soft) hints in getRegAllocationHints(). Review: Ulrich Weigand. llvm-svn: 354935	2019-02-27 00:18:28 +00:00
Saleem Abdulrasool	427aeb3ad2	vim: `swiftself` is an attribute Highlight the `swiftself` attribute on parameters. llvm-svn: 354934	2019-02-27 00:12:11 +00:00
Vedant Kumar	73522d1678	[HotColdSplit] Disable splitting for sanitized functions Splitting can make sanitizer errors harder to understand, as the trapping instruction may not be in the function where the bug was detected. rdar://48142697 llvm-svn: 354931	2019-02-26 22:55:46 +00:00
Rong Xu	35d2d51369	[PGO] Context sensitive PGO (part 1) Current PGO profile counts are not context sensitive. The branch probabilities for the inlined functions are kept the same for all call-sites, and they might be very different from the actual branch probabilities. These suboptimal profiles can greatly affect some downstream optimizations, in particular for the machine basic block placement optimization. In this patch, we propose to have a post-inline PGO instrumentation/use pass, which we called Context Sensitive PGO (CSPGO). For the users who want the best possible performance, they can perform a second round of PGO instrument/use on the top of the regular PGO. They will have two sets of profile counts. The first pass profile will be manly for inline, indirect-call promotion, and CGSCC simplification pass optimizations. The second pass profile is for post-inline optimizations and code-gen optimizations. A typical usage: // Regular PGO instrumentation and generate pass1 profile. > clang -O2 -fprofile-generate source.c -o gen > ./gen > llvm-profdata merge default.profraw -o pass1.profdata // CSPGO instrumentation. > clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2 > ./gen2 // Merge two sets of profiles > llvm-profdata merge default.profraw pass1.profdata -o profile.profdata // Use the combined profile. Pass manager will invoke two PGO use passes. > clang -O2 -fprofile-use=profile.profdata -o use This change touches many components in the compiler. The reviewed patch (D54175) will committed in phrases. Differential Revision: https://reviews.llvm.org/D54175 llvm-svn: 354930	2019-02-26 22:37:46 +00:00
Stanislav Mekhanoshin	da1628eb67	[AMDGPU] Fixed hang during DAG combine SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 llvm-svn: 354926	2019-02-26 20:56:25 +00:00
Eric Christopher	721eaeff3a	Fix a small comment typo. llvm-svn: 354923	2019-02-26 20:33:22 +00:00
Reid Kleckner	8fda7e15e6	[X86] Fix bug in vectorcall calling convention Original implementation can't correctly handle __m256 and __m512 types passed by reference through stack. This patch fixes it. Patch by Wei Xiao! Differential Revision: https://reviews.llvm.org/D57643 llvm-svn: 354921	2019-02-26 19:48:16 +00:00
Alina Sbirlea	9026404125	[MemorySSA & SimpleLoopUnswitch] Update MemorySSA in ReplaceUsesOfWith. SimpleLoopUnswitch must update MemorySSA when removing instructions. Resolves PR39197. llvm-svn: 354919	2019-02-26 19:44:52 +00:00
Craig Topper	d44db7e486	[X86] Use X86_CPU_SUBTYPE_COMPAT for 'cascadelake' cpu. This CPU is supported by at least libgcc trunk now so we should make it available to __builtin_cpu_is. llvm-svn: 354913	2019-02-26 19:17:12 +00:00
Julian Lettner	eb38a70d11	[lit] Allow setting parallelism groups to None Check that we do not crash if a parallelism group is explicitly set to None. Permits usage of the following pattern. [lit.common.cfg] lit_config.parallelism_groups['my_group'] = None if <condition>: lit_config.parallelism_groups['my_group'] = 3 [project/lit.cfg] config.parallelism_group = 'my_group' Reviewers: rnk Differential Revision: https://reviews.llvm.org/D58305 llvm-svn: 354912	2019-02-26 19:03:26 +00:00
Kristina Brooks	76eb4b02d9	Update docs of memcpy/move/set wrt. align and len Fix https://bugs.llvm.org/show_bug.cgi?id=38583: Describe how memcpy/memmove/memset behave when len=0. Also fix some fallout from when the alignment parameter was replaced by an attribute. This closes PR38583. Patch by RalfJung (Ralf) Differential Revision: https://reviews.llvm.org/D57600 llvm-svn: 354911	2019-02-26 18:53:13 +00:00
Andrew Ng	f38b005321	[TableGen] Make OpcodeMappings sort comparator deterministic NFCI The previous sort comparator was not deterministic, i.e. in some situations it would be possible for lhs < rhs && rhs < lhs. This was discovered by an STL assertion in a Windows debug build of llvm-tblgen. Differential Revision: https://reviews.llvm.org/D58687 llvm-svn: 354910	2019-02-26 18:50:49 +00:00
Sanjay Patel	9dada83d6c	[InstSimplify] remove zero-shift-guard fold for general funnel shift As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html We can't remove the compare+select in the general case because we are treating funnel shift like a standard instruction (as opposed to a special instruction like select/phi). That means that if one of the operands of the funnel shift is poison, the result is poison regardless of whether we know that the operand is actually unused based on the instruction's particular semantics. The motivating case for this transform is the more specific rotate op (rather than funnel shift), and we are preserving the fold for that case because there is no chance of introducing extra poison when there is no anonymous extra operand to the funnel shift. llvm-svn: 354905	2019-02-26 18:26:56 +00:00
Petar Avramovic	bd39569913	[MIPS GlobalISel] Select G_UADDO Lower G_UADDO. Legalize G_UADDO for MIPS32 Differential Revision: https://reviews.llvm.org/D58671 llvm-svn: 354900	2019-02-26 17:22:42 +00:00
Ganesh Gopalasubramanian	e172d7008d	[X86] AMD znver2 enablement This patch enables the following 1) AMD family 17h "znver2" tune flag (-march, -mcpu). 2) ISAs that are enabled for "znver2" architecture. 3) For the time being, it uses the znver1 scheduler model. 4) Tests are updated. 5) Scheduler descriptions are yet to be put in place. Reviewers: craig.topper Differential Revision: https://reviews.llvm.org/D58343 llvm-svn: 354897	2019-02-26 16:55:10 +00:00
Jonas Paulsson	c110b5b69f	[SystemZ] Wait with selection of legal vector/FP constants until Select(). This patch aims to make sure that any such constant that can be generated with a vector instruction (for example VGBM) is recognized as such during legalization and kept as a target independent node through post-legalize DAGCombining. Two new functions named isVectorConstantLegal() and loadVectorConstant() replace old ways of handling vector/FP constants. A new struct named SystemZVectorConstantInfo is used to cache the results of isVectorConstantLegal() and pass them onto loadVectorConstant(). Support for fp128 constants in the presence of FeatureVectorEnhancements1 (z14) has been added. Review: Ulrich Weigand https://reviews.llvm.org/D58270 llvm-svn: 354896	2019-02-26 16:47:59 +00:00
Sanjay Patel	421c6e6864	[InstSimplify] add tests for rotate; NFC Rotate is a special-case of funnel shift that has different poison constraints than the general case. That's not visible yet in the existing tests, but it needs to be corrected. llvm-svn: 354894	2019-02-26 16:44:08 +00:00
Sanjay Patel	840f5d6dce	[InstCombine] remove duplicate (but not updated) tests; NFC Not sure how it happened, but rL354886 was a duplicate of rL354881, but not updated with rL354887. llvm-svn: 354889	2019-02-26 15:25:42 +00:00
Sanjay Patel	e8bf0f79bd	[InstCombine] canonicalize more unsigned saturated add with 'not' Yet another pattern variation suggested by: https://bugs.llvm.org/show_bug.cgi?id=14613 There are 8 more potential commuted patterns here on top of the 8 that were already handled (rL354221, rL354276, rL354393). We have the obvious commute of the 'add' + commute of the cmp predicate/operands (ugt/ult) + commute of the select operands: Name: base %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ult i32 %x, %y %r = select i1 %c, i32 -1, i32 %a => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a Name: ugt %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ugt i32 %y, %x %r = select i1 %c, i32 -1, i32 %a => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a Name: commute select %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ult i32 %y, %x %r = select i1 %c, i32 %a, i32 -1 => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a Name: ugt + commute select %notx = xor i32 %x, -1 %a = add i32 %notx, %y %c = icmp ugt i32 %x, %y %r = select i1 %c, i32 %a, i32 -1 => %c2 = icmp ult i32 %a, %y %r = select i1 %c2, i32 -1, i32 %a https://rise4fun.com/Alive/den llvm-svn: 354887	2019-02-26 15:18:49 +00:00
Sanjay Patel	c9af54bb55	[InstCombine] add more tests for saturated add; NFC llvm-svn: 354886	2019-02-26 15:18:44 +00:00
Nirav Dave	582d46328c	[DAG] Fix constant store folding to handle non-byte sizes. Avoid crashes from zero-byte values due to sub-byte store sizes. Reviewers: uabelho, courbet, rnk Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58626 llvm-svn: 354884	2019-02-26 15:02:32 +00:00
Simon Atanasyan	8cb497027d	[mips] Emit `.module softfloat` directive This change fixes crash on an assertion in case of using `soft float` ABI for mips32r6 target. llvm-svn: 354882	2019-02-26 14:45:17 +00:00
Sanjay Patel	0d4f9216aa	[InstCombine] add more tests for saturated add; NFC llvm-svn: 354881	2019-02-26 14:40:23 +00:00
Andrea Di Biagio	c032e2ab7c	[MCA] Always check if scheduler resources are unavailable when reporting dispatch stalls. Dispatch stall cycles may be associated to multiple dispatch stall events. Before this patch, each stall cycle was associated with a single stall event. This patch also improves a couple of code comments, and adds a helper method to query the Scheduler for dispatch stalls. llvm-svn: 354877	2019-02-26 14:19:00 +00:00
George Rimar	b75bf8784e	[yaml2obj][obj2yaml] - Add support for the architecture specific dynamic tags. This allows tools to parse/dump the architecture specific tags like DT_MIPS_, DT_PPC64_ and DT_HEXAGON_* Also fixes a bug in DynamicTags.def which was revealed in this patch. Differential revision: https://reviews.llvm.org/D58667 llvm-svn: 354876	2019-02-26 14:14:49 +00:00
Simon Pilgrim	d4a406e499	[AArch64] Add arithmetic zext bswap tests. As requested on D58017. llvm-svn: 354872	2019-02-26 13:22:35 +00:00
Xing GUO	85b50a7679	[llvm-objdump] Add `Version Definitions` dumper Summary: `llvm-objdump` needs a `Version Definitions` dumper. Reviewers: grimar, jhenderson Reviewed By: grimar, jhenderson Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58615 llvm-svn: 354871	2019-02-26 13:06:16 +00:00
Igor Kudrin	2d3faad706	[llvm-objdump] Implement -Mreg-names-raw/-std options. The --disassembler-options, or -M, are used to customize the disassembler and affect its output. The two implemented options allow selecting register names on ARM: * With -Mreg-names-raw, the disassembler uses rNN for all registers. * With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15, which is the default behavior of llvm-objdump. Differential Revision: https://reviews.llvm.org/D57680 llvm-svn: 354870	2019-02-26 12:15:14 +00:00
Simon Pilgrim	e42be1eae2	[AArch64] Add 'free' zext bswap tests. As requested on D58017. llvm-svn: 354869	2019-02-26 12:04:37 +00:00
Luke Cheeseman	9e285bef2b	[ARM] Add Cortex-M35P - Add LLVM backend support for Cortex-M35P - Documentation can be found at https://developer.arm.com/products/processors/cortex-m/cortex-m35p Differentail Revision: https://reviews.llvm.org/D57763 llvm-svn: 354868	2019-02-26 12:02:12 +00:00
Simon Pilgrim	566177c3d5	[LegalizeDAG] Use APInt::getSplat helper to create bitreverse masks. NFCI. llvm-svn: 354867	2019-02-26 11:44:23 +00:00
Simon Pilgrim	810fa04ac7	[LegalizeDAG] Expand SADDO/SSUBO using SADDSAT/SSUBSAT (PR37763) If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results. I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit. Differential Revision: https://reviews.llvm.org/D58637 llvm-svn: 354866	2019-02-26 11:27:53 +00:00
Simon Pilgrim	2ccc120d19	[AMDGPU] Regenerate bswap/bitreverse tests. Make codegen changes more obvious in D58017 llvm-svn: 354863	2019-02-26 11:01:08 +00:00
Clement Courbet	0ddf81c43d	[llvm-exegesis] Teach llvm-exegesis to handle instructions with multiple tied variables. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58285 llvm-svn: 354862	2019-02-26 10:54:45 +00:00
Eugene Leviant	53350d0411	[llvm-objcopy] Add --set-start, --change-start and --adjust-start Differential revision: https://reviews.llvm.org/D58173 llvm-svn: 354854	2019-02-26 09:24:22 +00:00
Eugene Leviant	24b3d258bb	[ThinLTO] Use defined node and edge order when dumping DOT file Differential revision: https://reviews.llvm.org/D58631 llvm-svn: 354850	2019-02-26 07:38:21 +00:00
Vlad Tsyrklevich	c6d54ae9da	Revert "Improve "llvm-nm -f sysv" output for Elf files" This reverts commit r354833, it was causing ASan test failures on sanitizer-x86_64-linux-fast. llvm-svn: 354849	2019-02-26 07:04:56 +00:00
Chen Zheng	b9067e5990	[NFC] Add to contributor list. llvm-svn: 354847	2019-02-26 05:46:45 +00:00
Dan Gohman	c71132c0be	[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments For outgoing varargs arguments, it's necessary to check the OrigAlign field of the corresponding OutputArg entry to determine argument alignment, rather than just computing an alignment from the argument value type. This is because types like fp128 are split into multiple argument values, with narrower types that don't reflect the ABI alignment of the full fp128. This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase. Differential Revision: https://reviews.llvm.org/D58656 llvm-svn: 354846	2019-02-26 05:20:19 +00:00
Philip Reames	38b14e33a8	[ARM] Be super conservative about atomics As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Differential Revision: https://reviews.llvm.org/D58490 Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk. llvm-svn: 354845	2019-02-26 04:30:33 +00:00
Heejin Ahn	d2a56ac661	[WebAssembly] Fix a bug deleting instruction in a ranged for loop Summary: We shouldn't delete elements while iterating a ranged for loop. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58519 llvm-svn: 354844	2019-02-26 04:08:49 +00:00
Heejin Ahn	7829763e49	[WebAssembly] Improve readability of EH tests Summary: - Indent check lines to easily figure out try-catch-end structure - Add the original C++ code the tests were genereated from - Add a few more lines to make the structure more readable - Rename a couple function / structures - Add label and branch annotations to cfg-stackify-eh.ll - Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll` because it will be updated in a later CL soon and there's no point of making it look better here Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58562 llvm-svn: 354842	2019-02-26 03:29:59 +00:00
Aaron Smith	1d5f8632d7	[CodeView] Emit HasConstructorOrDestructor class option for non-trivial constructors Reviewers: zturner, rnk, llvm-commits, aleksandr.urakov Reviewed By: zturner, rnk Subscribers: jdoerfert, majnemer, asmith Tags: #llvm Differential Revision: https://reviews.llvm.org/D44406 llvm-svn: 354841	2019-02-26 03:23:56 +00:00
Reid Kleckner	8b6af00173	[llvm-cov] Fix llvm-cov on Windows and un-XFAIL test Summary: The llvm-cov tool needs to be able to find coverage names in the executable, so the .lprfn and .lcovmap sections cannot be merged into .rdata. Also, the linker merges .lprfn$M into .lprfn, so llvm-cov needs to handle that when looking up sections. It has to support running on both relocatable object files and linked PE files. Lastly, when loading .lprfn from a PE file, llvm-cov needs to skip the leading zero byte added by the profile runtime. Reviewers: vsk Subscribers: hiraditya, #sanitizers, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D58661 llvm-svn: 354840	2019-02-26 02:30:00 +00:00
Reid Kleckner	2f055f026a	[X86] Fix bug in x86_intrcc with arg copy elision Summary: Use a custom calling convention handler for interrupts instead of fixing up the locations in LowerMemArgument. This way, the offsets are correct when constructed and we don't need to account for them in as many places. Depends on D56883 Replaces D56275 Reviewers: craig.topper, phil-opp Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56944 llvm-svn: 354837	2019-02-26 02:11:25 +00:00
Sunil Srivastava	d72d16f444	Improve "llvm-nm -f sysv" output for Elf files Specifically, compute and Print Type and Section columns. Differential Revision: https://reviews.llvm.org/D58263 llvm-svn: 354833	2019-02-26 00:19:39 +00:00
Stanislav Mekhanoshin	ab25f1e65b	[AMDGPU] Added target to mir test. NFC. Test was used without -mcpu, although tested instructions not available on all ASICs. llvm-svn: 354830	2019-02-25 22:59:55 +00:00
Matt Arsenault	752579736e	RegBankSelect: Handle slightly more complex value mappings Try to use concat_vectors. Also remove unnecessary assert on pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers for AMDGPU. llvm-svn: 354828	2019-02-25 22:24:13 +00:00
Matt Arsenault	f4bfe4cd17	AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes llvm-svn: 354825	2019-02-25 21:32:48 +00:00
Roman Lebedev	6da9443890	Revert "[Support] Make raw_string_ostream unbuffered" Shame on me, did not run all the tests, bots are angry. This reverts commit r354819. llvm-svn: 354822	2019-02-25 21:11:19 +00:00
Simon Pilgrim	7166ab4704	[LangRef] .overflow intrinsics now support vectors We have all the necessary legalization, expansion and unrolling support required for the .overflow intrinsics with vector types, so update the docs to make that clear. Note: vectorization is not in place yet (the non-homogenous return types aren't well supported) so we still must explicitly use the vectors intrinsics and not reply on slp/loop. Differential Revision: https://reviews.llvm.org/D58618 llvm-svn: 354821	2019-02-25 21:05:09 +00:00
Roman Lebedev	0397f495c0	[Support] Make raw_string_ostream unbuffered Summary: In D58580 i have noted that `llvm::to_string()` is a memory hog. It uses `raw_string_ostream`, and since it was buffered, every `raw_string_ostream` had a cost of `BUFSIZ` bytes (which is `8192` at least here). So every `llvm::to_string()` call, even to just print an `int`, costed `8192` bytes. In D58580, getting rid of that buffering //had// significant performance and memory consumption improvements for `llvm-xray convert`. Similarly, in D58580 @rnk pointed out that the `raw_svector_ostream` is already unbuffered, and `write_unsigned_impl` and friends do internal buffering. So it should be ok performance-wise to just make the `raw_string_ostream` itself unbuffered. Here, i don't have any perf measurements. Another letdown is that i'm leaving a loose-end - not deleting the `flush()` method. I don't expect that cleanup to be anything more than just fixing every new compiler error, but i'm presently unable to do that. Will look into that later. Reviewers: rnk, zturner Reviewed By: rnk Subscribers: kristina, jdoerfert, llvm-commits, rnk Tags: #llvm Differential Revision: https://reviews.llvm.org/D58643 llvm-svn: 354819	2019-02-25 20:51:49 +00:00
Matt Arsenault	82b103998b	AMDGPU/GlobalISel: Clamp max implicit_def elements llvm-svn: 354818	2019-02-25 20:46:06 +00:00
Matt Arsenault	0b14857415	RegisterScavenger: Allow fail without spill AMDGPU wants to use this in some contexts where the spilling is either impossible, or a worse alternative to doing something else. llvm-svn: 354816	2019-02-25 20:29:04 +00:00
Matt Arsenault	f97ace5639	AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics EarlyCSE with MemorySSA was able to use this to merge multiple calls with no intervening store. llvm-svn: 354814	2019-02-25 20:16:11 +00:00
Matt Arsenault	84b3288853	GlobalISel: Make legalizer/regbankselect clear NoPHIs property If no phi existed in the original MIR and these introduced one, the verifier would fail. llvm-svn: 354813	2019-02-25 20:00:25 +00:00
Craig Topper	316c58e8f1	[X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case. Differential Revision: https://reviews.llvm.org/D58475 llvm-svn: 354811	2019-02-25 19:42:47 +00:00
Andrea Di Biagio	4a1e59a6e0	Fix a sign compare warning breaking the -Werror build. The warning was introduced at r354793. llvm-svn: 354810	2019-02-25 19:33:58 +00:00
Matt Arsenault	fd6fd00773	AMDGPU: Correct definitions for bitset instructions These really read and write the result register, so these need a tied input. llvm-svn: 354809	2019-02-25 19:24:46 +00:00
Nikita Popov	fcbd7f6495	[Mips] Fix missing masking in fast-isel of br (PR40325) Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending (and x, 1) the condition before branching on it. To avoid regressing trivial cases, I'm combining emission of cmp+br sequences for the single-use + same block case (similar to what we do in x86). icmpbr1.ll still regresses due to the cross-bb usage of the condition. Differential Revision: https://reviews.llvm.org/D58576 llvm-svn: 354808	2019-02-25 18:54:17 +00:00
Amara Emerson	6bcfa1c419	[AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC. This is a preparatory change as I want to use emitScalarToVector() elsewhere, and in general we want to transition to MIRBuilder instead of using BuildMI directly. Differential Revision: https://reviews.llvm.org/D58528 llvm-svn: 354807	2019-02-25 18:52:54 +00:00
Philip Reames	a64de6720b	[Lanai] Be super conservative about atomics As requested during review of D57601 <https://reviews.llvm.org/D57601>, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review. llvm-svn: 354800	2019-02-25 17:36:10 +00:00
Simon Pilgrim	80d0e9c563	[SelectionDAG] Add demanded elts variants to isConstOrConstSplat helpers. NFCI. These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well. We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that. This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D58503 llvm-svn: 354797	2019-02-25 16:31:58 +00:00
Simon Pilgrim	28441ac75f	[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold Differential Revision: https://reviews.llvm.org/D58585 llvm-svn: 354793	2019-02-25 16:02:01 +00:00
David Green	b504f104b2	[ARM] Add some more missing T1 opcodes for the peephole optimisier This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791	2019-02-25 15:50:54 +00:00
Simon Pilgrim	a066f1f9e6	[Vectorizer] Add vectorization support for fixed smul/umul intrinsics This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument. Differential Revision: https://reviews.llvm.org/D58616 llvm-svn: 354790	2019-02-25 15:42:02 +00:00
Luke Cheeseman	59f77e7891	[AArch64] Add support for Cortex-A76 and Cortex-A76AE - Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 llvm-svn: 354788	2019-02-25 15:08:27 +00:00
Eugene Leviant	51c1f640aa	[llvm-objcopy] Add --add-symbol Differential revision: https://reviews.llvm.org/D58234 llvm-svn: 354787	2019-02-25 14:12:41 +00:00
Dmitri Gribenko	a3a3964f98	Fixed typos in tests: s/CHEKC/CHECK/ Reviewers: ilya-biryukov Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58611 llvm-svn: 354785	2019-02-25 13:41:59 +00:00
Simon Pilgrim	42bf2dd629	[TTI] Add generic cost model for smul/umul overflow intrinsics Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO. llvm-svn: 354784	2019-02-25 13:30:23 +00:00
Simon Pilgrim	f54186abb6	[SLPVectorizer][X86] Add fixed smul/umul tests Baseline tests - fixed mul intrinsics aren't flagged as vectorizable yet llvm-svn: 354783	2019-02-25 13:26:30 +00:00
Xing GUO	56d651db0f	[llvm-objdump] Add `Version References` dumper Summary: Add symbol version dumper for [#30241](https://bugs.llvm.org/show_bug.cgi?id=30241) Reviewers: jhenderson, MaskRay, kristina, emaste, grimar Reviewed By: jhenderson, grimar Subscribers: grimar, rupprecht, jakehehrlich, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54697 llvm-svn: 354782	2019-02-25 13:13:19 +00:00
Dmitri Gribenko	751c5fbf6a	Fixed typos in tests: s/CEHCK/CHECK/ Reviewers: ilya-biryukov Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58608 llvm-svn: 354781	2019-02-25 13:12:33 +00:00
Ganesh Gopalasubramanian	f03939fcc3	Test commit (remove a blank space) Change-Id: I69175571d3b1defeb85e96fdd87db5c3ccadcb63 llvm-svn: 354775	2019-02-25 12:27:49 +00:00
Simon Pilgrim	9caf0f0d15	[TTI] Add generic cost model for fixed point smul/umul Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline. Differential Revision: https://reviews.llvm.org/D57925 llvm-svn: 354774	2019-02-25 11:59:23 +00:00
Simon Pilgrim	c61f1e8e6c	[X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483) Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results. Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB. Differential Revision: https://reviews.llvm.org/D58597 llvm-svn: 354771	2019-02-25 11:19:37 +00:00
James Henderson	fd99780c09	[yaml2obj]Re-allow dynamic sections to have raw content Recently, support was added to yaml2obj to allow dynamic sections to have a list of entries, to make it easier to write tests with dynamic sections. However, this change also removed the ability to provide custom contents to the dynamic section, making it hard to test malformed contents (e.g. because the section is not a valid size to contain an array of entries). This change reinstates this. An error is emitted if raw content and dynamic entries are both specified. Reviewed by: grimar, ruiu Differential Review: https://reviews.llvm.org/D58543 llvm-svn: 354770	2019-02-25 11:02:24 +00:00
Simon Tatham	b70fc0c5fd	[ARM] Make fullfp16 instructions not conditionalisable. More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768	2019-02-25 10:39:53 +00:00
Roman Lebedev	542e5d7bb5	[llvm-exegesis] Split Epsilon param into two (PR40787) Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767	2019-02-25 09:36:12 +00:00
Roman Lebedev	49b6f81a74	[XRay][tools] Revert "Use Support/JSON.h in llvm-xray convert" Summary: This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert Abstractions are great. Readable code is great. JSON support library is a good idea. However unfortunately, there is an internal detail that one needs to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`. So for every `llvm::json::Object`, even if you only store a single `int` entry there, you pay the whole price of `llvm::DenseMap`. Unfortunately, it matters for `llvm-xray`. I was trying to analyse the `llvm-exegesis` analysis mode performance, and for that i wanted to view the LLVM X-Ray log visualization in Chrome trace viewer. And the `llvm-xray convert` is sluggish, and sometimes even ended up being killed by OOM. `xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis` (compiled with ` -fxray-instruction-threshold=128`) analysis mode over `-benchmarks-file` with 10099 points (one full latency measurement set), with normal runtime of 0.387s. Timings: Old: (copied from D58580) ``` $ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs): 21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% ) 314 context-switches # 14.701 M/sec ( +- 59.13% ) 1 cpu-migrations # 0.037 M/sec ( +-100.00% ) 2181354 page-faults # 102191.251 M/sec ( +- 0.02% ) 85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%) 14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%) 32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%) 67896890228 instructions # 0.79 insn per cycle # 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%) 14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%) 212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%) 21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% ) ``` New: ``` $ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs): 7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% ) 182 context-switches # 25.402 M/sec ( +- 28.84% ) 0 cpu-migrations # 0.046 M/sec ( +- 70.71% ) 33701 page-faults # 4694.994 M/sec ( +- 0.88% ) 28761053971 cycles # 4006833.933 GHz ( +- 0.26% ) (83.32%) 2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%) 10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%) 36199132874 instructions # 1.26 insn per cycle # 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%) 6434504227 branches # 896420204.421 M/sec ( +- 0.03% ) (66.68%) 73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%) 7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% ) ``` So using `llvm::json` nearly triples run-time on that test case. (+3x is times, not percent.) Memory: Old: ``` total runtime: 39.88s. bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s) calls to allocation functions: 33267816 (834135/s) temporary memory allocations: 5832298 (146235/s) peak heap memory consumption: 9.21GB peak RSS (including heaptrack overhead): 147.98GB total memory leaked: 1.09MB ``` New: ``` total runtime: 17.42s. bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s) calls to allocation functions: 21382982 (1227284/s) temporary memory allocations: 232858 (13364/s) peak heap memory consumption: 350.69MB peak RSS (including heaptrack overhead): 2.55GB total memory leaked: 79.95KB ``` Diff: ``` total runtime: -22.46s. bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s) calls to allocation functions: -11884834 (529155/s) temporary memory allocations: -5599440 (249307/s) peak heap memory consumption: -8.86GB peak RSS (including heaptrack overhead): 0B total memory leaked: -1.01MB ``` So using `llvm::json` increases peak memory consumption on this testcase ~+27x. And total allocation count +15x. Both of these numbers are times, not percent. And note that memory usage is clearly unbound with `llvm::json`, it directly depends on the length of the log, so peak memory consumption is always increasing. This isn't so with the dumb code, there is no accumulating memory consumption, peak memory consumption is fixed. Naturally, that means it will handle much larger logs without OOM'ing. Readability is good, but the price is simply unacceptable here. Too bad none of this analysis was done as part of the development/review D50129 itself. Reviewers: dberris, kpw, sammccall Reviewed By: dberris Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58584 llvm-svn: 354764	2019-02-25 07:39:07 +00:00
Craig Topper	8c9724ea4f	[SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a MoveChild and MoveParent pair. OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2. This saves ~3000 bytes in the X86 table. llvm-svn: 354763	2019-02-25 03:11:44 +00:00
Kang Zhang	4faa4090c9	[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts Summary: Fast selection of llvm fptoi & fptrunc instructions is not handled well about VSX instruction support. We'd use VSX float convert integer instruction instead of non-vsx float convert integer instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is openeded. For float trunc instruction, we do this silimar work like float convert integer instruction to try to use VSX instruction. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D58430 llvm-svn: 354762	2019-02-25 02:46:16 +00:00
Nikita Popov	b7918f3c14	[InstCombine] Add tests for PR40846; NFC The icmps are the same as the overflow result of the intrinsic. llvm-svn: 354760	2019-02-24 21:55:37 +00:00
Nikita Popov	bdefe47857	[InstCombine] Move with.overflow tests to separate file; NFC And regenerate checks. I had to rename some variables, because update_test_checks can't deal with the same variable names used in lower and upper case. I've also dropped the result type aliases, as just using the type directly gives a cleaner result. llvm-svn: 354759	2019-02-24 21:55:31 +00:00
Simon Pilgrim	f43c48cb52	[X86] Add PR40483 test cases Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops llvm-svn: 354758	2019-02-24 21:13:29 +00:00
Simon Pilgrim	cfaf663a35	[X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637) Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step. llvm-svn: 354757	2019-02-24 19:57:52 +00:00
Craig Topper	3fe4bd464c	[X86] Fix tls variable lowering issue with large code model Summary: The problem here is the lowering for tls variable. Below is the DAG for the code. SelectionDAG has 11 nodes: t0: ch = EntryToken t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64 t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 t12: i64 = add t8, t11 t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64 t6: ch = CopyToReg t0, Register:i32 %0, t4 And when mcmodel is large, below instruction can NOT be folded. t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0" When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails. The patch is to fold the load and X86ISD::WrapperRIP. Fixes PR26906 Patch by LuoYuanke Reviewers: craig.topper, rnk, annita.zhang, wxiao3 Reviewed By: rnk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58336 llvm-svn: 354756	2019-02-24 19:33:37 +00:00
Craig Topper	5532a98737	[X86][SSE] Use pblendw for v4i32/v2i64 during isel. Summary: Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58574 llvm-svn: 354755	2019-02-24 19:23:41 +00:00
Craig Topper	ce2bd19c49	[X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake. Summary: The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate. The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate. Reviewers: RKSimon, andreadb Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58581 llvm-svn: 354754	2019-02-24 19:23:39 +00:00
Craig Topper	be3348573e	[LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753	2019-02-24 19:23:36 +00:00
Sanjay Patel	26aa702463	[InstCombine] add test for icmp+add fold; NFC llvm-svn: 354750	2019-02-24 17:31:15 +00:00
Simon Pilgrim	4f4f9abdfa	[X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC. Name better matches the other similar 'lane permute' and 'repeated mask' functions we have. llvm-svn: 354749	2019-02-24 17:30:06 +00:00
Sanjay Patel	9907d3c8b4	[InstCombine] canonicalize add/sub with bool add A, sext(B) --> sub A, zext(B) We have to choose 1 of these forms, so I'm opting for the zext because that's easier for value tracking. The backend should be prepared for this change after: D57401 rL353433 This is also a preliminary step towards reducing the amount of bit hackery that we do in IR to optimize icmp/select. That should be waiting to happen at a later optimization stage. The seeming regression in the fuzzer test was discussed in: D58359 We were only managing that fold in instcombine by luck, and other passes should be able to deal with that better anyway. llvm-svn: 354748	2019-02-24 16:57:45 +00:00
Sanjay Patel	986a024c19	[InstCombine] regenerate checks; NFC llvm-svn: 354747	2019-02-24 16:11:58 +00:00
Sanjay Patel	cb04ba032f	[CGP] add special-cases to form unsigned add with overflow (PR40486) There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746	2019-02-24 15:31:27 +00:00
Simon Pilgrim	9b49f36a03	Fix "enumeral and non-enumeral type in conditional expression" gcc7 warning. NFCI. llvm-svn: 354745	2019-02-24 13:31:52 +00:00
Heejin Ahn	20cf0749cb	[WebAssembly] Rename a variable in CFGStackify (NFC) llvm-svn: 354744	2019-02-24 08:30:06 +00:00
Heejin Ahn	25d924b41f	[WebAssembly] Merge two identical switch case routines into one (NFC) llvm-svn: 354743	2019-02-24 08:19:55 +00:00
Philip Reames	33d7e49bb7	[Hexagon, SystemZ] Be super conservative about atomics As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review. llvm-svn: 354740	2019-02-24 00:45:09 +00:00
Duncan P. N. Exon Smith	e7b9464943	VFS: Avoid some unnecessary std::string copies Thread Twine a little deeper through the VFS to avoid unnecessarily constructing the same std::string twice in a parameter sequence: Twine -> std::string -> StringRef -> std::string Changing a few parameters from StringRef to Twine avoids the early call to `Twine::str()`. llvm-svn: 354739	2019-02-23 23:48:47 +00:00
Craig Topper	dc185522fb	[TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands. The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist. X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here. Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change. llvm-svn: 354738	2019-02-23 21:41:44 +00:00
Craig Topper	be9eeb5526	Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" And its follow ups r354511, r354640. A follow patch will fix the issue that caused it to be reverted. llvm-svn: 354737	2019-02-23 21:41:42 +00:00
Craig Topper	ccc860cb81	Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract" r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit." These were reverted in r354713 as their context depended on other patches that were reverted for a bug. llvm-svn: 354734	2019-02-23 19:51:32 +00:00
Nikita Popov	e661f946a7	[WebAssembly] Fix select of and (PR40805) Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by patterns added in D53676. I'm removing the patterns entirely here, as they are not correct in the general case. If necessary something more specific can be added in the future. Differential Revision: https://reviews.llvm.org/D58575 llvm-svn: 354733	2019-02-23 18:59:01 +00:00
Simon Pilgrim	f383a47b7d	[X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) --> sub_vbroadcast(x) D58053/rL354340 added this to EltsFromConsecutiveLoads directly llvm-svn: 354732	2019-02-23 18:53:03 +00:00
Simon Pilgrim	398d0b9e96	Fix MSVC constant truncation warnings. NFCI. llvm-svn: 354731	2019-02-23 18:49:02 +00:00
Simon Pilgrim	e08f177ea2	[X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x) For AVX1, limit this to i32/f32/i64/f64 loading cases only. llvm-svn: 354730	2019-02-23 18:34:05 +00:00
Simon Pilgrim	31793733a0	[X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD. llvm-svn: 354729	2019-02-23 17:10:47 +00:00
Simon Dardis	86a589e38d	[MIPS] Fix a incorrect test. (NFC) This test is incorrect as it should be using the microMIPSR6 instruction to return, not the microMIPS version. llvm-svn: 354726	2019-02-23 15:56:32 +00:00
Craig Topper	75afc0105c	[X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel. Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t. This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost. The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results. llvm-svn: 354724	2019-02-23 08:34:10 +00:00
Michael Trent	7dcfac6171	objdump fails to parse Mach-O binaries with n_desc bearing stabs Summary: The objdump Mach-O parser uses MachOObjectFile::checkSymbolTable() to verify the symbol table is in a legal state before dereferencing the offsets in the table. This routine missed a test for N_STAB symbols when validating the two-level name space library ordinal for undefined symbols. If the binary in question contained a value in the n_desc high byte that is larger than the list of loaded dylibs, checkSymbolTable() will flag the library ordinal as being out of range. Most of the time the n_desc field is set to 0 or to small values, but old final linked binaries exist with N_STAB symbols bearing non-trivial n_desc fields. The change here is simply to verify a symbol is not an N_STAB symbol before consulting the values of n_other or n_desc. rdar://44977336 Reviewers: lhames, pete, ab Reviewed By: pete Subscribers: llvm-commits, rupprecht Tags: #llvm Differential Revision: https://reviews.llvm.org/D58568 llvm-svn: 354722	2019-02-23 06:19:56 +00:00
Daniel Sanders	6ac16e91f6	Try again to fix memory leak in r354692 The previous one didn't fix everything. llvm-svn: 354719	2019-02-23 03:25:37 +00:00
Jordan Rupprecht	6387fa2715	[NFC] Fix typos: preceeding -> preceding llvm-svn: 354715	2019-02-23 01:28:32 +00:00
Reid Kleckner	e3876637cf	Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713	2019-02-23 01:19:42 +00:00
Daniel Sanders	f250cf8b41	Fix memory leak in r354692 llvm-svn: 354712	2019-02-23 01:13:35 +00:00
Craig Topper	62619d064d	[LegalizeTypes] Use PromoteTargetBoolean in PromoteIntOp_ADDSUBCARRY instead of reimplementing it. NFCI llvm-svn: 354710	2019-02-23 00:38:19 +00:00
Craig Topper	a9697f24cf	[X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits. If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register. Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization. llvm-svn: 354709	2019-02-23 00:35:02 +00:00
Craig Topper	b95ca56361	[X86] Add a few test cases for a v8i64 sext/zext from an illegal type that needs to be promoted to 128 bits. If v8i64 isn't a legal type but v4i64 is, these will be split and then each half will get their input promoted and become an any_extend_vector_inreg/punpckhwd + any_extend + and/sign_extend_inreg. If we instead recognize the input will be promoted we can emit the and/sign_extend_inreg first in a 128 bit register. Then we can sign_extend/zero_extend one half and pshufd+sign_extend/zero_extend the other half. llvm-svn: 354708	2019-02-23 00:34:58 +00:00
Sam Clegg	275d15ecf3	[WebAssembly] Update CodeGen test expectations after rL354697. NFC llvm-svn: 354705	2019-02-23 00:07:39 +00:00
Konstantin Zhuravlyov	9a278bf6b5	Revert "AMDGPU/NFC: Cleanup subtarget predicates" It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700	2019-02-22 23:21:06 +00:00
Sanjay Patel	973143ab79	[CGP] add tests for uaddo increment/decrement; NFC llvm-svn: 354699	2019-02-22 23:19:34 +00:00
Sam Clegg	8fffa1dfa3	[WebAssembly] Remove unneeded MCSymbolRefExpr variants We record the type of the symbol (event/function/data/global) in the MCWasmSymbol and so it should always be clear how to handle a relocation based on the symbol itself. The exception is a function which still needs the special @TYPEINDEX then the relocation contains the signature rather than the address of the functions. Differential Revision: https://reviews.llvm.org/D58472 llvm-svn: 354697	2019-02-22 22:29:34 +00:00
Sam Clegg	ffba00bd47	[WebAssembly] MC: Handle aliases of aliases Differential Revision: https://reviews.llvm.org/D58417 llvm-svn: 354694	2019-02-22 21:41:42 +00:00
David Greene	3b9141df25	[CMake] Honor LLVM_EXTERNAL_<proj>_SOURCE_DIR When LLVM_ENABLE_PROJECTS is set, CMake assumes the project directories are all side-by-side. This is not always the case and there's no reason to expect it if LLVM_EXTERNAL_<proj>_SOURCE_DIR is set. Honor that setting if it exists and allow the build configuration to continue. Differential Revision: https://reviews.llvm.org/D49672 llvm-svn: 354693	2019-02-22 21:19:48 +00:00
Daniel Sanders	07cda257f8	Restore ability for C++ API users to Enable IPRA. Summary: Prior to r310876 one of our out-of-tree targets was enabling IPRA by modifying the TargetOptions::EnableIPRA. This no longer works on current trunk since the useIPRA() hook overrides any values that are set in advance. This patch adjusts the behaviour of the hook so that API users and useIPRA() can both enable it but useIPRA() cannot disable it if the API user already enabled it. Reviewers: arsenm Reviewed By: arsenm Subscribers: wdng, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D38043 llvm-svn: 354692	2019-02-22 20:59:07 +00:00
Sanjay Patel	ffe1cf5e92	[CGP] move overflow intrinsic insertion to common location; NFCI We need to enhance the uaddo matching to handle special-cases as seen in PR40486 and PR31754. That means we won't necessarily have a def-use pattern, so we'll need to check dominance to determine where to place the intrinsic (as we already do for usubo). This preliminary patch is just rearranging the code, so the planned follow-up to improve uaddo will be more clear. llvm-svn: 354689	2019-02-22 20:20:24 +00:00
Matt Arsenault	7b55066a34	MIR: Preserve incoming frame index numbers Don't skip incrementing the frame index number if the object is dead. Instructions can still be referencing the old frame index number, and this doesn't attempt to remap those. The resulting MIR then fails to load because the use instructions use a higher frame index number than recorded list of stack objects. I'm not sure it's possible to craft a testcase with the existing set of passes. It requires selectively marking some stack objects dead in an essentially random order. StackSlotColoring condenses towards the low indexes. This avoids a regression in a future AMDGPU commit when some frame indexes are lowered separately from PEI. llvm-svn: 354688	2019-02-22 19:30:38 +00:00
Matt Arsenault	6d05d6a7b6	CodeGen: Make RegAllocRegistry a template class Will allow re-using the machinery for independent sets of register allocators. This will allow AMDGPU to use separate command line options for the allocator to use for SGPRs separate from VGPRs. llvm-svn: 354687	2019-02-22 19:16:52 +00:00
Matt Arsenault	476e26b5d3	AMDGPU: Use removeAllRegUnitsForPhysReg llvm-svn: 354686	2019-02-22 19:03:36 +00:00
Matt Arsenault	45cfe9822d	LiveIntervals: Add removeAllRegUnitsForPhysReg Convenience wrapper for removing the reg units of a physical register. llvm-svn: 354685	2019-02-22 19:03:31 +00:00
Sam Clegg	a5e68748bf	[WebAssembly] Remove debug statement submitted in rL354657 Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58549 llvm-svn: 354684	2019-02-22 19:00:03 +00:00
Mitch Phillips	cb0c05cbeb	[GN] Updated build file to allow GN builds to succeed at ToT. llvm-svn: 354683	2019-02-22 18:45:41 +00:00
Guozhi Wei	4c8e480358	[MBP] Factor out function hasViableTopFallthrough and enhancement This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect. Differential Revision: https://reviews.llvm.org/D58393 llvm-svn: 354682	2019-02-22 18:04:37 +00:00
Nirav Dave	46f939c118	Disable big-endian constant store merges from rL354676. llvm-svn: 354677	2019-02-22 16:20:34 +00:00
Nirav Dave	44037d7a63	[DAGCombine] Fold overlapping constant stores Fold a smaller constant store into larger constant stores immediately preceeding it. Reviewers: rnk, courbet Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58468 llvm-svn: 354676	2019-02-22 16:00:19 +00:00
Sanjay Patel	a9e289174a	[x86] allow narrowing of vector UINT_TO_FP As discussed in: D56864 D58197 Always use the narrow (128-bit) instruction when possible. We already had the signed int version of this transform. llvm-svn: 354675	2019-02-22 15:47:45 +00:00
Sanjay Patel	1baf7896cc	[x86] simplify code in combineExtractSubvector; NFC Only the 1st fold is attempted pre-legalization, but it requires legal (simple) types too, so we don't need an EVT in any of the code. llvm-svn: 354674	2019-02-22 15:28:22 +00:00
Matt Arsenault	65b4ab9921	BreakCriticalEdges: Update PostDominatorTree llvm-svn: 354673	2019-02-22 15:01:41 +00:00
Petar Jovanovic	6083106b12	[mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM Filling a delay slot in 32bit jump instructions with a 16bit instruction can cause issues. According to the documentation such an operation is unpredictable. This patch adds opcode Mips::PseudoIndirectBranch_MM alongside Mips::PseudoIndirectBranch and other instructions that are expanded to jr instruction and do not allow a 16bit instruction in their delay slots. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D58507 llvm-svn: 354672	2019-02-22 14:53:58 +00:00
Roman Tereshin	99a6672bba	[LowerSwitch][AMDGPU] Do not handle impossible values This patch adds LazyValueInfo to LowerSwitch to compute the range of the value being switched over and reduce the size of the tree LowerSwitch builds to lower a switch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D58096 llvm-svn: 354670	2019-02-22 14:33:46 +00:00
Chijun Sima	70e97163e0	[DTU] Refine the interface and logic of applyUpdates Summary: This patch separates two semantics of `applyUpdates`: 1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update. 2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated. Logic changes: Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example, ``` DTU(Lazy) and Edge A->B exists. 1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued 2. Remove A->B 3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended) ``` But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue. Interface changes: The second semantic of `applyUpdates` is separated to `applyUpdatesPermissive`. These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`. Reviewers: kuhar, brzycki, dmgreen, grosser Reviewed By: brzycki Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58170 llvm-svn: 354669	2019-02-22 13:48:38 +00:00
David Green	acb628b2af	[ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri instruction can be used with frame indices. In thumb1 we use tADDframe. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354667	2019-02-22 12:23:31 +00:00
Diana Picus	35e1c6663c	[ARM GlobalISel] Support floating point for Thumb2 This is exactly the same as arm mode, so for the instruction selector tests we just extract them to a new file and run with the same checks for both arm and thumb mode. For the legalizer we need to update the tests for soft float a bit, but only because BL and tBL are slightly different. We could be pedantic and check that we get a well-formed BL for arm mode and a tBL for thumb, but for the purposes of the legalizer test it's sufficient to just skip over the predicate operands in the checks. Also note that we have the pedantic checks in the divmod test, so we're covered. llvm-svn: 354665	2019-02-22 09:54:54 +00:00
George Rimar	d22686b637	Fix BB after r354661 Update 2 test cases after obj2yaml fix in r354661. llvm-svn: 354663	2019-02-22 08:58:23 +00:00
George Rimar	11358dd65d	[obj2yaml] - Do not miss section index for special symbols. This fixes https://bugs.llvm.org/show_bug.cgi?id=40786 ("obj2yaml symbol output missing section index for SHN_ABS and SHN_COMMON symbols") Since SHN_ABS and SHN_COMMON symbols are special, we should preserve the st_shndx for them. The patch does this for them and the other special symbols. The test case is based on the test provided by James Henderson at the bug page! Differential revision: https://reviews.llvm.org/D58498 llvm-svn: 354661	2019-02-22 08:45:21 +00:00
Alina Sbirlea	151100787d	[MemorySSA] Update test with minimized one. NFCI llvm-svn: 354658	2019-02-22 07:34:54 +00:00
Heejin Ahn	85631d8b50	[WebAssembly] Remove getBottom function from CFGStackify (NFC) Summary: This removes `getBottom` function and the bookeeping map of <begin marker instruction, bottom BB>. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58319 llvm-svn: 354657	2019-02-22 07:19:30 +00:00
Alina Sbirlea	90d2e3a16d	[MemorySSA & LoopPassManager] Resolve PR40038. The correct edge being deleted is not to the unswitched exit block, but to the original block before it was split. That's the key in the map, not the value. The insert is correct. The new edge is to the .split block. The splitting turns OriginalBB into: OriginalBB -> OriginalBB.split. Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete ParentBB->OriginalBB, not ParentBB->OriginalBB.split. llvm-svn: 354656	2019-02-22 07:18:37 +00:00
Craig Topper	fa6187d230	[LegalizeVectorOps] Improve the placement of ANDs in the ExpandLoad path for non-byte-sized loads. When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits. We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results. So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR. llvm-svn: 354655	2019-02-22 07:03:25 +00:00
Craig Topper	069cf05e87	[LegalizeVectorOps] Simplify the non-byte sized load handling VectorLegalizer::ExpandLoad. NFCI Remove an if that should always be true. Merge the body of another into the only block that could make the if true. llvm-svn: 354654	2019-02-22 06:18:33 +00:00
Craig Topper	0ca023b3b7	[X86] Add test cases to cover the path in VectorLegalizer::ExpandLoad for non-byte sized loads where bits from two loads need to be concatenated. If the scalar type doesn't divide evenly into the WideVT then the code will need to take some bits from adjacent scalar loads and combine them. But most of our testing is for i1 element type which always divides evenly. llvm-svn: 354653	2019-02-22 06:18:32 +00:00
Chijun Sima	f131d6110e	[DTU] Deprecate insertEdge/deleteEdge Summary: This patch converts all existing `insertEdge/deleteEdge` to `applyUpdates` and marks `insertEdge/deleteEdge` as deprecated. Reviewers: kuhar, brzycki Reviewed By: kuhar, brzycki Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58443 llvm-svn: 354652	2019-02-22 05:41:43 +00:00
Lang Hames	de9b30db3d	Fix a think-o in the disable-kaleidoscope-tests-on-windows predicate of r354646. llvm-svn: 354650	2019-02-22 03:56:50 +00:00
Matt Arsenault	0280a5e143	DAG: Add helper for creating shifts with correct type llvm-svn: 354649	2019-02-22 03:38:47 +00:00
Craig Topper	3a391fc0e8	[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit. Type legalization is causing two nodes to be created here, but we can use a single node to extend from v8i16 to v2i64. llvm-svn: 354648	2019-02-22 01:49:53 +00:00
Craig Topper	be22f329a9	[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract. Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those. But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results. By promoting the input at the same time we can create only a single any_extend or truncate. There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch. llvm-svn: 354647	2019-02-22 01:49:50 +00:00
Lang Hames	4a7db8cb90	Add 'Windows' to the disabled platforms list for the Kaleidoscope tests. Expands on the check from r354645. llvm-svn: 354646	2019-02-22 01:44:23 +00:00

... 2 3 4 5 6 ...

175839 Commits