llvm-project

Commit Graph

Author	SHA1	Message	Date
Zvi Rackover	d7a1c334ce	DAGCombine: Combine BUILD_VECTOR to TRUNCATE Summary: Add a combine for creating a truncate to replace a build_vector composed of extracts with indices that form a stride-2^N series. Example: v8i32 V = ... v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6)) --> v4i32 truncate (bitcast V to v4i64) Related discussion in llvm-dev about canonicalizing shuffles to truncates in LLVM IR: http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html. Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena Reviewed By: delena Subscribers: guyblank, delena, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34077 llvm-svn: 307036	2017-07-03 15:47:40 +00:00
Sanjay Patel	e9b1d16a8c	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. There were also over-specifications in the RUN params such as CPU model. llvm-svn: 307033	2017-07-03 15:27:19 +00:00
Sanjay Patel	d3173740fd	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. There were also several over-specifications in the RUN params such as CPU model or OS requirement llvm-svn: 307028	2017-07-03 15:04:05 +00:00
Simon Pilgrim	decfaca033	[X86][SSE4A] Add tests showing missed opportunities to combine EXTRQI/INSERTQI shuffles llvm-svn: 307027	2017-07-03 15:01:07 +00:00
Alexander Timofeev	ea7f08bee5	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026	2017-07-03 14:54:11 +00:00
Sanjay Patel	dab798a25f	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 307024	2017-07-03 14:29:45 +00:00
Sanjay Patel	448095c19b	[InstCombine] move and improve tests for cmp-intrinsic; NFC llvm-svn: 307022	2017-07-03 14:07:40 +00:00
Benjamin Kramer	fb620493e1	Revert "[GVN] Recommit the patch "Add phi-translate support in scalarpre"." This reverts commit r306313. This breaks selfhost at -O3 and PR33652. Let me know if you need additional information on reproducing the issue. llvm-svn: 307021	2017-07-03 12:23:10 +00:00
Igor Breger	5c787ab346	[GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection. llvm-svn: 307019	2017-07-03 11:06:54 +00:00
Craig Topper	766ce6e9cf	[InstCombine] Support BITWISE_OP( BSWAP(x), CONSTANT ) -> BSWAP( BITWISE_OP(x, BSWAP(CONSTANT) ) ) for splat vectors. llvm-svn: 307002	2017-07-03 05:54:15 +00:00
Craig Topper	1a79c38d5e	[InstCombine] Add test cases for BITWISE_OP( BSWAP(x), CONSTANT ) -> BSWAP( BITWISE_OP(x, BSWAP(CONSTANT) ) ) with splat vectors. NFC llvm-svn: 307001	2017-07-03 05:54:14 +00:00
Craig Topper	1e4643a98e	[InstCombine] Support BITWISE_OP(BSWAP(A),BSWAP(B))->BSWAP(BITWISE_OP(A, B)) for vectors. llvm-svn: 306999	2017-07-03 05:54:13 +00:00
Craig Topper	960ce1ee20	[InstCombine] Add test cases showing missed opportunity to fold BITWISE_OP(BSWAP(A),BSWAP(B))->BSWAP(BITWISE_OP(A, B)) for vectors. NFC llvm-svn: 306998	2017-07-03 05:54:12 +00:00
Matt Arsenault	3f031e75aa	AMDGPU: Add operand target flags serialization llvm-svn: 306995	2017-07-02 23:21:48 +00:00
Simon Pilgrim	f05c5ef441	[X86][AVX512] Test AVX512VPOPCNTDQ CTPOP with/without AVX512BW llvm-svn: 306991	2017-07-02 19:52:20 +00:00
Simon Pilgrim	a9655ffb42	[X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOP Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back. llvm-svn: 306990	2017-07-02 19:32:37 +00:00
Simon Pilgrim	3f5ed96f92	[X86][AVX512] Cleanup tzcnt tests triples and attributes Avoid use of specific -mcpu llvm-svn: 306989	2017-07-02 18:51:48 +00:00
Simon Pilgrim	df55dd09d6	[X86][AVX512] Cleanup popcnt tests triples and attributes Avoid use of specific -mcpu llvm-svn: 306988	2017-07-02 18:35:22 +00:00
Sanjay Patel	b51e072d35	[InstCombine] fix crash when folding cmp+bswap vector We assumed the constant was a scalar when creating the replacement operand. Also, improve tests for this fold and move the tests for this fold to their own file. I'll move the related and missing tests to this file as a follow-up. llvm-svn: 306985	2017-07-02 16:05:11 +00:00
Sanjay Patel	7d263c1a27	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 306984	2017-07-02 15:24:08 +00:00
Sanjay Patel	dd076f0178	[x86] remove unnecessary RUN for test after auto-generating checks; NFC llvm-svn: 306983	2017-07-02 15:16:17 +00:00
Sanjay Patel	c22223e6cd	[x86] update test to use FileCheck and auto-generate checks; NFC llvm-svn: 306982	2017-07-02 15:15:18 +00:00
Sanjay Patel	27cccc96c2	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 306981	2017-07-02 14:50:35 +00:00
Sanjay Patel	c3d5cf0bb7	[InstCombine] look through bswap/bitreverse for equality comparisons I noticed this missed bswap optimization in the CGP memcmp() expansion, and then I saw that we don't have the fold in InstCombine. Differential Revision: https://reviews.llvm.org/D34763 llvm-svn: 306980	2017-07-02 14:34:50 +00:00
NAKAMURA Takumi	91a00b974d	llvm/test/Transforms/LoopVectorize/X86/slm-no-vectorize.ll: -debug is available in +Asserts. llvm-svn: 306979	2017-07-02 14:25:27 +00:00
Simon Pilgrim	8971b2904e	[X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306978	2017-07-02 14:16:25 +00:00
Simon Pilgrim	4cb5613c38	[X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive The 32-bit shuffles are a bit tricky and will be dealt with in a later patch llvm-svn: 306977	2017-07-02 13:19:10 +00:00
Simon Pilgrim	638af5f1c4	[X86][SSE] Add test showing missed opportunity to combine to pshuflw We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306976	2017-07-02 12:56:10 +00:00
Mohammed Agabaria	eb09a810e6	[X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974	2017-07-02 12:16:15 +00:00
Gadi Haber	dc25c2b08b	[X86] Rerun "update_llc_test_checks" tool on CodeGen tests. NFC. This is NFC after rerunning the "update_llc_test_checks.py" tool on the CodeGen X86 tests in order to submit a patch. Minor differences due to added "End of Function" lines. Reviewers: zvi Differential Revision: https://reviews.llvm.org/D34933 llvm-svn: 306973	2017-07-02 12:01:33 +00:00
Igor Breger	717bd36c83	[GlobalISel][X86] Support G_GLOBAL_VALUE operation. Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34738 Conflicts: test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir llvm-svn: 306972	2017-07-02 08:58:29 +00:00
Igor Breger	b186a69aa5	[GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection. Summary: Support vector type G_UNMERGE_VALUES selection. For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Reviewers: t.p.northover, qcolombet, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33665 llvm-svn: 306971	2017-07-02 08:15:49 +00:00
Hiroshi Inoue	bb703e8960	fix trivial typos; NFC suport -> support llvm-svn: 306968	2017-07-02 03:24:54 +00:00
Craig Topper	f60ab47098	[InstCombine] Fold (a \| b) ^ (~a \| ~b) --> ~(a ^ b) and (a & b) ^ (~a & ~b) --> ~(a ^ b) Summary: I came across this while thinking about what would happen if one of the operands in this xor pattern was itself a inverted (A & ~B) ^ (~A & B)-> (A^B). The patterns here assume that the (~a \| ~b) will be demorganed to ~(a & b) first. Though I wonder if there's a multiple use case that would prevent the demorgan. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34870 llvm-svn: 306967	2017-07-02 01:15:51 +00:00
Simon Pilgrim	3bad6f3167	[X86][RDSEED] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well. llvm-svn: 306961	2017-07-01 16:42:16 +00:00
Simon Pilgrim	2d320161e5	[X86][RDRAND] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well. llvm-svn: 306960	2017-07-01 16:41:12 +00:00
Simon Pilgrim	2b679e1812	[X86] Removed reference to update_test_checks.py llvm-svn: 306959	2017-07-01 16:34:29 +00:00
Simon Pilgrim	ad7f0844ea	[X86][AVX] Remove duplicate autogeneration note llvm-svn: 306958	2017-07-01 16:32:02 +00:00
Hiroshi Inoue	ef1c2ba22a	fix trivial typos, NFC llvm-svn: 306952	2017-07-01 07:12:15 +00:00
Dylan McKay	5df526210a	[AVR] Remove a bunch of now-obselete tests The fixups in these instructions are now lowered into relocations. llvm-svn: 306947	2017-07-01 05:23:13 +00:00
Eric Christopher	3df231a1f7	Remove the default ARMSubtarget from the ARM TargetMachine. This enables us to ensure better LTO and code generation in the face of module linking. Remove a report_fatal_error from the TargetMachine and replace it with an assert in ARMSubtarget - and remove the test that depended on the error. The assertion will still fire in the case that we were reporting before, but error reporting needs to be in front end tools if possible for options parsing. llvm-svn: 306939	2017-07-01 03:41:53 +00:00
Davide Italiano	9282f1aece	[Cloner] Re-map simplfied cloned instructions. This commit pretty much rolls back the logic added in r306495 as in the testcase provided we simplify an `icmp` looking through a PHI that hasn't been mapped yet. I think instsimplify shouldn't do threading over select/phis or just looking through phis in general, but this is what we have now. Also, add a test to prevent this from happening in case somebody wants to modify this code again. Briefly discussed with Kyle Butt (thanks Kyle!). llvm-svn: 306938	2017-07-01 03:29:33 +00:00
Teresa Johnson	32d95742b8	Recommit "r306541 - Add zero-length check to memcpy/memset load store loop expansion"" With fix for use-after-free errors. We can't add the new branch and remove the old one until we are done with the Builder constructed for the block. llvm-svn: 306937	2017-07-01 03:24:10 +00:00
Teresa Johnson	c12306c0ad	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306936	2017-07-01 03:24:09 +00:00
Teresa Johnson	eb4fba9d61	re-commit r306336: Enable vectorizer-maximize-bandwidth by default. Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306935	2017-07-01 03:24:08 +00:00
Teresa Johnson	de56903bde	revert r306336 for breaking ppc test. llvm-svn: 306934	2017-07-01 03:24:07 +00:00
Teresa Johnson	1fbaffeba1	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306933	2017-07-01 03:24:06 +00:00
Eric Christopher	015dc2094e	Rewrite ARM execute only support to avoid the use of a command line flag and unqualified ARMSubtarget lookup. Paired with a clang commit to use the new behavior. llvm-svn: 306927	2017-07-01 02:55:22 +00:00
Brian Gesiak	4ef3daafef	[ORE] Add diagnostics hotness threshold Summary: Add an option to prevent diagnostics that do not meet a minimum hotness threshold from being output. When generating optimization remarks for large codebases with a ton of cold code paths, this option can be used to limit the optimization remark output at a reasonable size. Discussion of this change can be read here: http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html Reviewers: anemet, davidxl, hfinkel Reviewed By: anemet Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D34867 llvm-svn: 306912	2017-06-30 23:14:53 +00:00
Zachary Turner	af8c75a8c0	[llvm-pdbutil] Output the symbol offset when dumping. Type records have a unique type index, but symbol records do not. Instead, symbol records refer to other symbol records by referencing their offset in the symbol stream. In a sense this is the analogue of the TypeIndex, but we are not printing it in the dumper. Printing it not only gives us more useful information when manually investigating the contents of a PDB, but also allows us to write better tests by enabling us to verify that fields that reference other symbol records do so correctly. Differential Revision: https://reviews.llvm.org/D34906 llvm-svn: 306890	2017-06-30 21:35:00 +00:00
Reid Kleckner	45a7462094	[codeview] Use the first valid source location at the top of every MBB If the instructions at the beginning of the block have no location, we're better off using the location of the first instruction in the current basic block. At the very least, that instruction post-dominates this one, whereas if we don't emit a .cv_loc directive, we end up using the potentially invalid location that falls through from the previous block. We could probably do better here by emitting some kind of ".cv_loc end" directive that stops the line table entry of the previous .cv_loc directive from bleeding out of its basic block. This would improve the line table when an entire MBB has no valid location info. llvm-svn: 306889	2017-06-30 21:33:44 +00:00
Krzysztof Parzyszek	9eb75c4520	[Hexagon] Implement frame pointer elimination with -fomit-frame-pointer It applies to leaf functions that are otherwise not required to have a frame pointer. llvm-svn: 306888	2017-06-30 21:21:40 +00:00
Ayal Zaks	2ff59d4350	[LV] Sink casts to unravel first order recurrence Check if a single cast is preventing handling a first-order-recurrence Phi, because the scheduling constraints it imposes on the first-order-recurrence shuffle are infeasible; but they can be made feasible by moving the cast downwards. Record such casts and move them when vectorizing the loop. Differential Revision: https://reviews.llvm.org/D33058 llvm-svn: 306884	2017-06-30 21:05:06 +00:00
Richard Smith	d0c0c13447	Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR This is a short-term fix for PR33650 aimed to get the modules build bots green again. Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR macros to try to locally specialize a global template for a global type. That's not how C++ works. Instead, we now centrally define how to format vectors of fundamental types and of string (std::string and StringRef). We use flow formatting for the former cases, since that's the obvious right thing to do; in the latter case, it's less clear what the right choice is, but flow formatting is really bad for some cases (due to very long strings), so we pick block formatting. (Many of the cases that were using flow formatting for strings are improved by this change.) Other than the flow -> block formatting change for some vectors of strings, this should result in no functionality change. Differential Revision: https://reviews.llvm.org/D34907 Corresponding updates to clang, clang-tools-extra, and lld to follow. llvm-svn: 306878	2017-06-30 20:56:57 +00:00
Sumanth Gundapaneni	d2dd79bf84	[Hexagon] Guard the generation of lookup table The llvm flag "-hexagon-emit-lookup-tables" guards the generation of lookup table generated from a switch statement. Differential Revision: https://reviews.llvm.org/D34819 llvm-svn: 306877	2017-06-30 20:54:24 +00:00
Ulrich Weigand	03ab2e2b1c	[SystemZ] Add all remaining instructions This adds all remaining instructions that were still missing, mostly privileged and semi-privileged system-level instructions. These are provided for use with the assembler and disassembler only. This brings the LLVM assembler / disassembler to parity with the GNU binutils tools. llvm-svn: 306876	2017-06-30 20:43:40 +00:00
Tim Northover	ff5e7e1295	GlobalISel: add G_IMPLICIT_DEF instruction. It looks like there are two target-independent but not GISel instructions that need legalization, IMPLICIT_DEF and PHI. These are already anomalies since their operands have important LLTs attached, so to make things more uniform it seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF. llvm-svn: 306875	2017-06-30 20:27:36 +00:00
Sumanth Gundapaneni	8c5d59557d	[Hexagon] Emit jump tables in text section based on a flag This patch adds a new LLVM flag -hexagon-emit-jt-text which is defaulted to "false". The value "true" emits the switch generated jump tables in text section. Differential Revision: https://reviews.llvm.org/D34820 llvm-svn: 306872	2017-06-30 20:21:48 +00:00
Sumanth Gundapaneni	19b74203b1	Revert "[Hexagon] Guard the generation of lookup table" This reverts commit ae521f4192c3ed0202c047fec993cb59133dd1a0. Wrong commit message llvm-svn: 306871	2017-06-30 20:20:00 +00:00
Sumanth Gundapaneni	cf73758dc8	[Hexagon] Guard the generation of lookup table The llvm flag "-hexagon-emit-lookup-tables" guards the generation of lookup table from a switch statement. Differential Revision: https://reviews.llvm.org/D34819 llvm-svn: 306869	2017-06-30 20:10:28 +00:00
Sumanth Gundapaneni	5372f0a73e	[SimplifyCFG] Update the name of switch generated lookup table. This patch appends the name of the function to the switch generated lookup table. This will ease the visual debugging in identifying the function the table is generated from. Differential Revision: https://reviews.llvm.org/D34817 llvm-svn: 306867	2017-06-30 20:00:01 +00:00
Tim Northover	2b5f03aa12	ARM: fix big-endian 64-bit cmpxchg. On big-endian machines the high and low parts of the value accessed by ldrexd and strexd are swapped around. To account for this we swap inputs and outputs in ISelLowering. Patch by Bharathi Seshadri. llvm-svn: 306865	2017-06-30 19:51:02 +00:00
Sanjay Patel	1be7ea4ad5	[PowerPC] auto-generate check lines; NFC The existing check lines were more flexible, but these are small enough tests that there shouldn't be much question about register allocation. I've been hand-modifying this file as I change the CGP memcmp expansion, but that's more error-prone and time-consuming than just running the update script. llvm-svn: 306861	2017-06-30 19:20:54 +00:00
Zachary Turner	990f01f8c6	Fix test broken by parameter mixup. llvm-svn: 306856	2017-06-30 18:25:07 +00:00
Eric Beckmann	e44afc4aff	Fix bug in symbol generation for resource COFF Symbols in the resource COFF file should be for .rsrc$02, where the actual resource data is, not .rsrc$01, which contains the directory tree. Differential Revision: https://reviews.llvm.org/D34832 Patch by Joe Ranieri. llvm-svn: 306853	2017-06-30 18:16:35 +00:00
Zachary Turner	02a267758e	[llvm-pdbutil] Add the ability to dump the dependency tree for a type Previously we had the -type-index option which would dump the record of a single, but we had no way to follow the dependency graph backwards and also dump all dependent types. Having this option makes test-writing better, because we can limit the test to only those records that are of importance for the thing we're trying to test, which allows us to use things like CHECK-NEXT to reduce fragility. Differential Revision: https://reviews.llvm.org/D34899 llvm-svn: 306852	2017-06-30 18:15:47 +00:00
Anna Thomas	e5e5e59d8b	[RuntimeUnrolling] Add logic for loops with multiple exit blocks Summary: Runtime unrolling is done for loops with a single exit block and a single exiting block (and this exiting block should be the latch block). This patch adds logic to support unrolling in the presence of multiple exit blocks (which also means multiple exiting blocks). Currently this is under an off-by-default option and is supported when epilog code is generated. Support in presence of prolog code will be in a future patch (we just need to add more tests, and update comments). This patch is essentially an implementation patch. I have not added any heuristic (in terms of branches added or code size) to decide when this should be enabled. Reviewers: mkuper, sanjoy, reames, evstupac Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33001 llvm-svn: 306846	2017-06-30 17:57:07 +00:00
Zachary Turner	e9db96e6d9	Revert "[lit] Clean output directories before running tests." This reverts commit da6318a92fba793e4f2447ec478b001392d57d43. This is causing failures on some build bots due to what appears to be some kind of lit ordering dependency. llvm-svn: 306833	2017-06-30 16:05:03 +00:00
Zachary Turner	0955739b36	[lit] Clean output directories before running tests. Presently lit leaks files in the tests' output directories. Specifically, if a test creates output files, lit makes no effort to remove them prior to the next test run. This is problematic because it leads to false positives whenever a test passes because stale files were present. In general it is a source of flakiness that should be removed. This patch addresses this by building the list of all test directories that are part of the current run set, and then deleting those directories and recreating them anew. This gives each test a clean baseline to start from. Differential Revision: https://reviews.llvm.org/D34732 llvm-svn: 306832	2017-06-30 16:01:30 +00:00
Simon Dardis	da96c43682	[MIPS] Handle PIC load address macro instructions in N64. In particular, use CALL16 (similar to O32) for address loads into T9 for certain cases. Otherwise use a %got_disp relocation to load the address of a symbol. Small offsets (small enough to fit in a 16-bit signed immediate) can be used and are added to the symbol address after it is loaded from the GOT. Larger offsets are currently unsupported and result in an error from the assembler. Reviewers: sdardis Reviewed By: sdardis Patch by: John Baldwin Subscribers: llvm-commits, seanbruno, arichardson, emaste, dim Differential Revision: https://reviews.llvm.org/D33948 llvm-svn: 306831	2017-06-30 15:44:27 +00:00
Alexey Bataev	8843aab83a	[SLP] A test for limiting vectorization of instructions, NFC. llvm-svn: 306828	2017-06-30 14:37:32 +00:00
Teresa Johnson	b247ffbaed	[LTO] Remove values from non-prevailing comdats Summary: When linking a regular LTO module, if it has any non-prevailing values (dropped to available_externally) in comdats, we need to do more than just remove those values from their comdat. We also remove all values from that comdat, so as to avoid leaving an incomplete comdat. This is necessary in case we are compiling in mixed regular and ThinLTO mode, since the resulting regularLTO native object is always linked into the final binary first. We need to prevent the linker from selecting an incomplete comdat that was not the prevailing copy. Fixes PR32980. Reviewers: pcc, rafael Subscribers: mehdi_amini, david2050, llvm-commits, inglorion Differential Revision: https://reviews.llvm.org/D34803 llvm-svn: 306826	2017-06-30 14:03:24 +00:00
Ulrich Weigand	9932f92882	[SystemZ] Add missing high-word facility instructions There are a few instructions provided by the high-word facility (z196) that we cannot easily exploit for code generation. This patch at least adds those missing instructions for the assembler and disassembler. This means that now all nonprivileged instructions up to z13 are supported by the LLVM assembler / disassembler. llvm-svn: 306821	2017-06-30 12:56:29 +00:00
Nirav Dave	a35938d827	Revert "[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset" This reverts commit r306819 which appears be exposing underlying issues in a stage1 ppc64be build llvm-svn: 306820	2017-06-30 12:56:02 +00:00
Nirav Dave	c5a48c1ee8	[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using generic checks. Also, propagate missing local handling from there to BaseIndexOffset checks. Tests of note: * test/CodeGen/X86/build-vector* - Improved. * test/CodeGen/BPF/undef.ll - Improved store alignment allows an additional store merge * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a case we already do not handle well. Here, the DAG is improved, but scheduling causes a code size degradation. Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D34472 llvm-svn: 306819	2017-06-30 12:23:41 +00:00
Simon Pilgrim	e5e9232260	[X86] Updated 32-bit memcmp tests to run with/without SSE2 llvm-svn: 306816	2017-06-30 11:23:59 +00:00
Nikolai Bozhenov	bde9b14c6f	Revert of r306525: "Canonicalize clamp of float types to minmax" llvm-svn: 306815	2017-06-30 10:39:09 +00:00
George Rimar	892c6c86ea	[YAML] - Teach yaml2obj/obj2yaml to work with numeric relocation values. That may be useful if we want to produce or parse object containing broken relocation values using yaml2obj/obj2yaml. Previously that was impossible because only enum values were parsed correctly, this patch allows to put any numeric value as a relocation type. Differential revision: https://reviews.llvm.org/D34758 llvm-svn: 306814	2017-06-30 10:31:03 +00:00
Hiroshi Inoue	a89d4b5f2f	fix trivial typos, NFC llvm-svn: 306808	2017-06-30 09:11:50 +00:00
Ayal Zaks	8d26f0a602	[LV] Optimize for size when vectorizing loops with tiny trip count It may be detrimental to vectorize loops with very small trip count, as various costs of the vectorized loop body as well as enclosing overheads including runtime tests and scalar iterations may outweigh the gains of vectorizing. The current cost model measures the cost of the vectorized loop body only, expecting it will amortize other costs, and loops with known or expected very small trip counts are not vectorized at all. This patch allows loops with very small trip counts to be vectorized, but under OptForSize constraints, which ensure the cost of the loop body is dominant, having no runtime guards nor scalar iterations. Patch inspired by D32451. Differential Revision: https://reviews.llvm.org/D34373 llvm-svn: 306803	2017-06-30 08:02:35 +00:00
Craig Topper	97cd0173b9	[InstCombine] Add test cases to demonstrate failure to fold (a \| b) ^ (~a \| ~b) --> ~(a ^ b) and its commuted variants. llvm-svn: 306801	2017-06-30 07:37:42 +00:00
Craig Topper	880bf82685	[InstCombine] In foldXorToXor, move the commutable matcher from the LHS match to the RHS match. No meaningful change intended. There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique. This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice. llvm-svn: 306800	2017-06-30 07:37:41 +00:00
Chandler Carruth	3545a9e1f9	Remove the BBVectorize pass. It served us well, helped kick-start much of the vectorization efforts in LLVM, etc. Its time has come and past. Back in 2014: http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html Time to actually let go and move forward. =] I've updated the release notes both about the removal and the deprecation of the corresponding C API. llvm-svn: 306797	2017-06-30 07:09:08 +00:00
Martin Storsjo	43c854535e	[llvm-readobj] Improve printouts for COFF ARM64 binaries Differential Revision: https://reviews.llvm.org/D34835 llvm-svn: 306795	2017-06-30 07:02:13 +00:00
Martin Storsjo	8ae07ac837	[llvm-readobj] Include the PE magic value in printouts This is useful for a testcase in lld. Differential Revision: https://reviews.llvm.org/D34836 llvm-svn: 306794	2017-06-30 07:02:04 +00:00
Daniel Jasper	3b704ceba1	Revert "r306541 - Add zero-length check to memcpy/memset load store loop expansion" Segfaults in non-optimized builds. I'll get a stack trace and a reproducer to Teresa. llvm-svn: 306793	2017-06-30 06:37:33 +00:00
Daniel Jasper	5ce1ce742e	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306792	2017-06-30 06:32:21 +00:00
Max Kazantsev	8d0322e612	[SCEV] Use depth limit instead of local cache for SExt and ZExt In rL300494 there was an attempt to deal with excessive compile time on invocations of getSign/ZeroExtExpr using local caching. This approach only helps if we request the same SCEV multiple times throughout recursion. But in the bug PR33431 we see a case where we request different values all the time, so caching does not help and the size of the cache grows enormously. In this patch we remove the local cache for this methods and add the recursion depth limit instead, as we do for arithmetics. This gives us a guarantee that the invocation sequence is limited and reasonably short. Differential Revision: https://reviews.llvm.org/D34273 llvm-svn: 306785	2017-06-30 05:04:09 +00:00
Jakub Kuderski	837755cf8b	[Dominators] Don't compute DFS InOut numbers eagerly. Summary: DFS InOut numbers currently get eagerly computer upon DomTree construction. They are only needed to answer dome dominance queries and they get invalidated by updates and recalculations. Because of that, it is faster in practice to compute them lazily when they are actually needed. Clang built without this patch takes 6m 45s to boostrap on my machine, and with the patch applied 6m 38s. Reviewers: sanjoy, dberlin, chandlerc Reviewed By: dberlin Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D34296 llvm-svn: 306778	2017-06-30 01:28:21 +00:00
Heejin Ahn	ac62b05d05	[WebAssembly] Add support for exception handling instructions Summary: This adds backend support for throw, rethrow, try, and try_end instructions. This needs the corresponding clang builtin support: https://reviews.llvm.org/D34783 This follows the Wasm exception handling proposal in https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md Reviewers: sunfish, dschuff Reviewed By: dschuff Subscribers: jfb, sbc100, jgravelle-google Differential Revision: https://reviews.llvm.org/D34826 llvm-svn: 306774	2017-06-30 00:43:15 +00:00
Eric Christopher	ee837a59f7	Unified logic for computing target ABI in backend and front end by moving this common code to Support/TargetParser. Modeled Triple::GNU after front end code (aapcs abi) and updated tests that expect apcs abi. Based heavily on a patch by Ana Pazos! llvm-svn: 306768	2017-06-30 00:03:54 +00:00
Aditya Nandakumar	20f6207013	[GISel]: New Opcode G_FLOG/G_FLOG2 https://reviews.llvm.org/D34837 llvm-svn: 306766	2017-06-29 23:43:44 +00:00
Taewook Oh	0e35ea3b7c	Remove redundant copy in recurrences Summary: If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code, ``` BB#1: ; Loop Header %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: ; Loop Latch %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0 %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> ``` Existing two-address generation pass generates following code: ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1: ``` .LBB0_6: addl %esi, %edi addl %ebx, %edi cmpl $10, %edi movl %edi, %esi jl .LBB0_1 ``` This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: derived from LLVM BB %bb7 Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` and the final assembly does not have redundant copy: ``` .LBB0_6: addl %edi, %eax addl %ebx, %eax cmpl $10, %eax jl .LBB0_1 ``` Reviewers: qcolombet, MatzeB, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31821 llvm-svn: 306758	2017-06-29 23:11:24 +00:00
Tim Shen	664706916b	[ThinkLTO] Invoke build(Thin)?LTOPreLinkDefaultPipeline. Previously it doesn't actually invoke the designated new PM builder functions. This patch moves NameAnonGlobalPass out from PassBuilder, as Chandler points out that PassBuilder is used for non-O0 builds, and for optimizations only. Differential Revision: https://reviews.llvm.org/D34728 llvm-svn: 306756	2017-06-29 23:08:38 +00:00
Simon Dardis	dede76f428	Revert "[mips] Fix multiprecision arithmetic." This reverts commit r305389. This broke chromium builds, so reverting while I investigate further. llvm-svn: 306741	2017-06-29 20:59:47 +00:00
Keno Fischer	05e4ac26a2	[CodeGenPrepare] Don't create inttoptr for ni ptrs Summary: Arguably non-integral pointers probably shouldn't show up here at all, but since the backend doesn't complain and this takes valid (according to the Verifier) IR and makes it invalid, make sure not to introduce any inttoptr instructions if we're dealing with non-integral pointers. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D33110 llvm-svn: 306737	2017-06-29 20:28:59 +00:00
Spyridoula Gravani	837c110cb1	[DWARF] Added verification checks for the .apple_names section. This patch verifies the number of atoms, the validity of the form for each atom, as well as the validity of the hashdata. For hashdata, we're verifying that the hashdata offset is correct and that the offset in the .debug_info for each DIE in the hashdata is also valid. llvm-svn: 306735	2017-06-29 20:13:05 +00:00
Keno Fischer	99886f09a1	[AliasSetTracker] Don't drop AA MD so eagerly Summary: When we have patterns like loop: %la = load %ptr, !tbaa %lba = load %ptr, !tbaa !noalias AliasSetTracker would previously think that the two types of annotation for the pointer conflict, dropping both for the purpose of determining alias sets. That is clearly way too conservative, as the tbaa is still valid whether or not one of the memory accesses has additional AA metadata. We could go one step further and attempt to properly merge the AA metadata, but it's not clear that that would be worth it since that may introduce additional MD nodes, which may be undesirable since this is merely an Analysis. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32139 llvm-svn: 306727	2017-06-29 19:13:11 +00:00
Alexandre Isoard	41044876fc	Reverting r306695 while investigating failing test case. Failing test case: Transforms/LoopVectorize.iv_outside_user.ll llvm-svn: 306723	2017-06-29 18:48:56 +00:00
Sam Clegg	531da0081f	llvm-nm: Add support for symbol demangling (-C/--demangle) Differential Revision: https://reviews.llvm.org/D34668 llvm-svn: 306718	2017-06-29 18:29:05 +00:00
Leo Li	20fbad9307	[ConstantHoisting] Avoid hoisting constants in GEPs that index into a struct type. Summary: Indices for GEPs that index into a struct type should always be constants. This added more checks in `collectConstantCandidates:` which make sure constants for GEP pointer type are not hoisted. This fixed Bug https://bugs.llvm.org/show_bug.cgi?id=33538 Reviewers: ributzka, rnk Reviewed By: ributzka Subscribers: efriedma, llvm-commits, srhines, javed.absar, pirama Differential Revision: https://reviews.llvm.org/D34576 llvm-svn: 306704	2017-06-29 17:03:34 +00:00
Alexandre Isoard	aa29afc756	ScalarEvolution: Add URem support In LLVM IR the following code: %r = urem <ty> %t, %b is equivalent to: %q = udiv <ty> %t, %b %s = mul <ty> nuw %q, %b %r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t As UDiv, Mul and Sub are already supported by SCEV, URem can be implemented with minimal effort this way. Note: While SRem and SDiv are also related this way, SCEV does not provides SDiv yet. llvm-svn: 306695	2017-06-29 16:29:04 +00:00
Krzysztof Parzyszek	0089419417	[Hexagon] Keep all phi nodes when building DFG in addr-mode-opt The dead phis are needed for finding correct would-be reaching defs in register propagation. llvm-svn: 306690	2017-06-29 15:55:59 +00:00
Eugene Leviant	6269d39f44	[llvm-objdump] Handle invalid instruction gracefully on ARM Differential revision: https://reviews.llvm.org/D34813 llvm-svn: 306687	2017-06-29 15:38:47 +00:00
Yonghong Song	5fbe01b12d	bpf: remove unnecessary truncate operation For networking-type bpf program, it often needs to access packet data. A context data structure is provided to the bpf programs with two fields: u32 data; u32 data_end; User can access these two fields with ctx->data and ctx->data_end. During program verification process, the kernel verifier modifies the bpf program with loading of actual pointer value from kernel data structure. r = ctx->data ===> r = actual data start ptr r = ctx->data_end ===> r = actual data end ptr A typical program accessing ctx->data like char data_ptr = (char )(long)ctx->data will result in a 32-bit load followed by a zero extension. Such an operation is combined into a single LDW in DAG combiner as bpf LDW does zero extension automatically. In cases like the below (which can be a result of global value numbering and partial redundancy elimination before insn selection): B1: u32 a = load-32-bit &ctx->data u64 pa = zext a ... B2: u32 b = load-32-bit &ctx->data u64 pb = zext b ... B3: u32 m = PHI(a, b) u64 pm = zext m In B3, "pm = zext m" cannot be removed, which although is legal from compiler perspective, will generate incorrect code after kernel verification. This patch recognizes this pattern and traces through PHI node to see whether the operand of "zext m" is defined with LDWs or not. If it is, the "zext m" itself can be removed. The patch also recognizes the pattern where the load and use of the load value not in the same basic block, where truncate operation may be removed as well. The patch handles 1-byte, 2-byte and 4-byte truncation. Two test cases are added to verify the transformation happens properly for the above code pattern. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 306685	2017-06-29 15:18:54 +00:00
Nikolai Bozhenov	1925594ea0	[NFC] Use stdin for some tests instead of positional argument. Summary: Otherwise unexpected matches with the path to the tests might happen. Reviewers: rengolin, spatel, efriedma, RKSimon Reviewed By: spatel Subscribers: n.bozhenov, javed.absar, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D32994 llvm-svn: 306684	2017-06-29 14:51:54 +00:00
Daniel Neilson	aad1a6f0a4	Restore original intent of memset instcombine test Summary: The original intent of test/Transforms/InstCombine/memset.ll was to test for lowering of llvm.memset into stores when the size of the memset is 1, 2, 4, or 8. Sometime between then and now the test has stopped testing for that, but remained passing due to testing for the absence of llvm.memset calls rather than the presence of store instructions. Right now this test ends up with an empty function body because the alloca is eliminated as safe-to-remove, which results in the llvm.memset calls's being eliminated due to their pointer args being undef; so it is not testing for conversion of llvm.memset into store instructions at all. This change alters the test to verify that store instructions are created, and moves the target of the memset to an arg of the proc to avoid it being eliminated as unused. Reviewers: anna, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D34642 llvm-svn: 306681	2017-06-29 14:21:28 +00:00
Daniel Neilson	82b016672d	Explicitly check for presence of correct results in instcombine memmove test Summary: Rather than testing for expected results, test/Transforms/InstCombine/memmove.ll is testing for the absence of calls to llvm.memmove. In the case of test3, the test has stopped testing for materialization of loads/stores, but remained passing due to testing for the absence of llvm.memset calls rather than the presence of load/store instructions. Right now this test ends up with an empty function body because the alloca is eliminated as safe-to-remove, which results in the llvm.memmove calls being eliminated due to a pointer arg being undef; so it is not testing for conversion of llvm.memmove into load/store instructions at all. Reviewers: eli.friedman, anna, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D34645 llvm-svn: 306679	2017-06-29 14:17:50 +00:00
Hiroshi Inoue	6989caa931	[PowerPC] fix potential verification error on __tls_get_addr This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed. Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP. Differential Revision: https://reviews.llvm.org/D34347 llvm-svn: 306678	2017-06-29 14:13:38 +00:00
Daniel Jasper	559aa75382	Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue" I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676	2017-06-29 13:58:24 +00:00
Igor Breger	0cddd34876	[GlobalISel][X86] Support vector type G_MERGE_VALUES selection. Summary: Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33958 llvm-svn: 306665	2017-06-29 12:08:28 +00:00
Simon Pilgrim	9a68e69c68	[X86][SSE] Dropped -mcpu from palignr tests Use triple and attribute only for consistency Add AVX tests as well llvm-svn: 306664	2017-06-29 11:13:39 +00:00
Simon Pilgrim	e2eacbfc23	[X86][SSE] Regenerate shuffle test with update_llc_test_checks.py llvm-svn: 306663	2017-06-29 11:11:37 +00:00
Simon Pilgrim	0afe97f480	[X86][SSE] Dropped -mcpu from vector shift tests Use triple and attribute only for consistency llvm-svn: 306662	2017-06-29 11:09:53 +00:00
Simon Pilgrim	91539ce2d3	[X86][SSE] Dropped -mcpu from zero insertion tests Use triple and attribute only for consistency llvm-svn: 306661	2017-06-29 11:08:11 +00:00
Michael Zuckerman	4bcb9c3349	[LLVM][X86][Goldmont] Adding new target-cpu: Goldmont [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658	2017-06-29 10:00:33 +00:00
Florian Hahn	08fdd040b5	[ARM] Add tGPRwithpc register class and use it for TBB/THH Summary: TBB and THH allow using a Thumb GPR or the PC as destination operand. A few machine verifier failures where due to those instructions not expecting PC as destination operand. Add -verify-machineinstrs to test/CodeGen/ARM/jump-table-tbh.ll to add test coverage even if expensive checks are disabled. Reviewers: MatzeB, t.p.northover, jmolloy Reviewed By: MatzeB Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34610 llvm-svn: 306654	2017-06-29 08:45:31 +00:00
Zvi Rackover	da3943d600	[X86] Adding shuffle tests demonstrating missed vcompress opportunities. NFC llvm-svn: 306646	2017-06-29 06:22:01 +00:00
Keno Fischer	a236dae5d1	[InstCombine] Retain TBAA when narrowing memory accesses Summary: As discussed on the mailing list it is legal to propagate TBAA to loads/stores from/to smaller regions of a larger load tagged with TBAA. Do so for (load->extractvalue)=>(gep->load) and similar foldings. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D31954 llvm-svn: 306615	2017-06-28 23:36:40 +00:00
Matt Arsenault	7c525903ef	AMDGPU: Remove SITypeRewriter This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603	2017-06-28 21:38:50 +00:00
Stanislav Mekhanoshin	a45584bebe	Fold fneg and fabs like multiplications Given no NaNs and no signed zeroes it folds: (fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X)) (fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X) Differential Revision: https://reviews.llvm.org/D34579 llvm-svn: 306592	2017-06-28 20:25:50 +00:00
Sanjay Patel	57f57262c5	[InstCombine] add tests for icmp with bitreversed ops; NFC This is similar enough to bswap that we might as well handle them together in one patch. llvm-svn: 306591	2017-06-28 20:02:35 +00:00
Geoff Berry	378374d457	[AArch64][Falkor] Try to avoid exhausting HW prefetcher resources when unrolling. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34533 llvm-svn: 306584	2017-06-28 18:53:09 +00:00
Simon Pilgrim	64d182d36f	[BBVectorize][X86] Regenerate simple tests llvm-svn: 306578	2017-06-28 18:08:40 +00:00
Craig Topper	65aeba70de	[InstCombine] Remove 64-bit bit width restriction from m_ConstantInt(uint64_t*&) I think we only need to make sure the value fits in 64-bits not that bit width is 64-bit. This helps places that use this for shift amounts since the shift amount needs to be the same bitwidth as the LHS, but can't be larger than the bit width. Differential Revision: https://reviews.llvm.org/D34737 llvm-svn: 306577	2017-06-28 18:07:29 +00:00
Ayal Zaks	d9bc43ef2a	[LV] Fix PR33613 - retain order of insertelement per part r306381 caused PR33613, by reversing the order in which insertelements were generated per unroll part. This patch fixes PR33613 by retraining this order, placing each set of insertelements per part immediately after the last scalar being packed for this part. Includes a test case derived from PR33613. Reference: https://bugs.llvm.org/show_bug.cgi?id=33613 Differential Revision: https://reviews.llvm.org/D34760 llvm-svn: 306575	2017-06-28 17:59:33 +00:00
Rafael Espindola	96367a3d1e	Fix PR33625. We were failing to convert this expression to pcrel. llvm-svn: 306573	2017-06-28 17:56:07 +00:00
Simon Pilgrim	54822d1432	[BBVectorize] Regenerate simple tests llvm-svn: 306571	2017-06-28 17:40:20 +00:00
Chih-Hung Hsieh	514dafdae3	Another test commit. llvm-svn: 306567	2017-06-28 17:12:51 +00:00
Geoff Berry	b0573547f6	[LoopUnroll] Fix bug in computeUnrollCount causing it to not honor MaxCount Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D34532 llvm-svn: 306564	2017-06-28 17:01:15 +00:00
Sanjay Patel	1a132d27c6	[InstCombine] add tests for icmp with bswapped operands; NFC llvm-svn: 306563	2017-06-28 16:56:45 +00:00
Krzysztof Parzyszek	3008594cd4	Missed a check for UndefVI in r306466 llvm-svn: 306553	2017-06-28 15:46:16 +00:00
Alexandros Lamprineas	c0432d86aa	[AArch64] AArch64CondBrTuningPass generates wrong branch instructions Some conditional branch instructions generated by this pass are checking the wrong condition code. The instructions TBZ and TBNZ are transformed into B.GE and B.LT instead of B.PL and B.MI respectively. They should only be checking the Negative bit. Differential Revision: https://reviews.llvm.org/D34743 llvm-svn: 306550	2017-06-28 15:09:11 +00:00
John Brawn	75d76e5e95	[ARM] Improve if-conversion for M-class CPUs without branch predictors The current heuristic in isProfitableToIfCvt assumes we have a branch predictor, and so gives the wrong answer in some cases when we don't. This patch adds a subtarget feature to indicate that a subtarget has no branch predictor, and changes the heuristic in isProfitableToiIfCvt when it's present. This gives a slight overall improvement in a set of embedded benchmarks on Cortex-M4 and Cortex-M33. Differential Revision: https://reviews.llvm.org/D34398 llvm-svn: 306547	2017-06-28 14:11:15 +00:00
Simon Pilgrim	48b30c3d55	[X86] Added BSWAP tests for illegal i64/i128/i256 'wide' scalar integers llvm-svn: 306546	2017-06-28 14:07:50 +00:00
Simon Pilgrim	4f5fcb03ad	[X86][SSE] Dropped -mcpu from vector bswap tests Use triple and attribute only for consistency llvm-svn: 306545	2017-06-28 13:59:15 +00:00
Michael Zuckerman	d0e663a697	[X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test. Exapnding the test to include AVX target. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306543	2017-06-28 13:42:45 +00:00
Teresa Johnson	538b8d25f0	Add zero-length check to memcpy/memset load store loop expansion Summary: I was testing using this expansion logic in other cases besides NVPTX, and found some runtime failures due to the lack of a check for a zero length memcpy/memset before the loop. There is already such a check in the memmove expansion code though. Reviewers: hfinkel Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D34707 llvm-svn: 306541	2017-06-28 13:07:37 +00:00
Igor Breger	86cf07a32e	[GlobalISel][X86] Test G_CONSTANT i32 0 TableGen'erated selection.NFC. llvm-svn: 306537	2017-06-28 12:43:21 +00:00
Nikolai Bozhenov	6710ba07c7	Revert r306528 llvm-svn: 306536	2017-06-28 12:15:13 +00:00
Igor Breger	d5b59cf914	[GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XOR Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: aymanmus Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34605 llvm-svn: 306533	2017-06-28 11:39:04 +00:00
Michael Zuckerman	f66840020c	Reverting commit 306414 on behalf of @gadi.haber llvm-svn: 306532	2017-06-28 11:23:31 +00:00
Simon Pilgrim	b9fa16bc53	[X86][AVX2] Dropped -mcpu from avx2 arithmetic/intrinsics tests Use triple and attribute only for consistency llvm-svn: 306531	2017-06-28 10:54:54 +00:00
Petar Jovanovic	7b3a38ec30	[X86] Correct dwarf unwind information in function epilogue CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529	2017-06-28 10:21:17 +00:00
Nikolai Bozhenov	77b5536e4e	[ValueTracking] Enabling existing ValueTracking patch by default. The original patch was an improvement to IR ValueTracking on non-negative integers. It has been checked in to trunk (D18777, r284022). But was disabled by default due to performance regressions. Perf impact has improved. The patch would be enabled by default. Reviewers: reames Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 306528	2017-06-28 10:08:08 +00:00
Nikolai Bozhenov	b01e6b5a52	[InstCombine] Canonicalize clamp of float types to minmax in fast mode. Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 306525	2017-06-28 09:26:20 +00:00
Nikolai Bozhenov	4ec1bb6f39	Add tests to document current InstCombine behavior for clamp pattern. Summary: This commit adds the tests for clamp pattern as a prerequisite of D33186 to make the impact of that fix more clear and also to document current behavior. Reviewers: spatel, jmolloy Reviewed By: spatel Subscribers: n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D34350 llvm-svn: 306524	2017-06-28 09:22:58 +00:00
Kristof Beyls	eecb353d0e	[ARM] Make -mcpu=generic schedule for an in-order core (Cortex-A8). The benchmarking summarized in http://lists.llvm.org/pipermail/llvm-dev/2017-May/113525.html showed this is beneficial for a wide range of cores. As is to be expected, quite a few small adaptations are needed to the regressions tests, as the difference in scheduling results in: - Quite a few small instruction schedule differences. - A few changes in register allocation decisions caused by different instruction schedules. - A few changes in IfConversion decisions, due to a difference in instruction schedule and/or the estimated cost of a branch mispredict. llvm-svn: 306514	2017-06-28 07:07:03 +00:00
Craig Topper	8fe3603ff1	[InstCombine] Add test case demonstrating that we don't handle icmp eq (trunc (lshr(X, cst1)), cst->icmp (and X, mask), cst when the shift type is larger than 64-bits. NFC llvm-svn: 306510	2017-06-28 06:45:36 +00:00
Craig Topper	7f124694c5	Revert r306508 "[InstCombine] Add test case demonstrating that we don't handle icmp eq (trunc (lshr(X, cst1)), cst->icmp (and X, mask), cst when the shift type is larger than 64-bits. NFC" I accidentally had a extra change in there. llvm-svn: 306509	2017-06-28 06:43:58 +00:00
Craig Topper	1d5b4b634b	[InstCombine] Add test case demonstrating that we don't handle icmp eq (trunc (lshr(X, cst1)), cst->icmp (and X, mask), cst when the shift type is larger than 64-bits. NFC llvm-svn: 306508	2017-06-28 06:42:48 +00:00
Stanislav Mekhanoshin	d445455643	[AMDGPU] Add pattern for v_alignbit_b32 with immediate If immediate in shift is less than 32 we can use alignbit too. Differential Revision: https://reviews.llvm.org/D34729 llvm-svn: 306500	2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin	eb40733bf0	Allow to truncate left shift with non-constant shift amount That is pretty common for clang to produce code like (shl %x, (and %amt, 31)). In this situation we can still perform trunc (shl) into shl (trunc) conversion given the known value range of shift amount. Differential Revision: https://reviews.llvm.org/D34723 llvm-svn: 306499	2017-06-28 02:37:11 +00:00
Mandeep Singh Grang	0c72172e32	[COFF, ARM64] Add support for Windows ARM64 COFF format Summary: This is the llvm part of the initial implementation to support Windows ARM64 COFF format. I will gradually add more functionality in subsequent patches. Reviewers: ruiu, rnk, t.p.northover, compnerd Reviewed By: ruiu, compnerd Subscribers: aemerson, mgorny, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D34705 llvm-svn: 306490	2017-06-27 23:58:19 +00:00
Peter Collingbourne	99b98c21f2	Object: Teach irsymtab::read() to try to use the irsymtab that we wrote to disk. Fixes PR27551. Differential Revision: https://reviews.llvm.org/D33974 llvm-svn: 306488	2017-06-27 23:50:24 +00:00
Peter Collingbourne	92648c25a4	Bitcode: Write the irsymtab to disk. Differential Revision: https://reviews.llvm.org/D33973 llvm-svn: 306487	2017-06-27 23:50:11 +00:00
Sanjay Patel	4b23fa0abf	[CGP] add specialization for memcmp expansion with only one basic block llvm-svn: 306485	2017-06-27 23:15:01 +00:00
Easwaran Raman	c5fa6358ba	[NewPM/Inliner] Reduce threshold for cold callsites in the non-PGO case Differential Revision: https://reviews.llvm.org/D34312 llvm-svn: 306484	2017-06-27 23:11:18 +00:00
Florian Hahn	2665febb54	[AArch64] Inline callee if its target-features are a subset of the caller Summary: Similar to X86, it should be safe to inline callees if their target-features are a subset of the caller. This change matches GCC's inlining behavior with respect to attributes [1]. [1] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes Reviewers: kristof.beyls, javed.absar, rengolin, t.p.northover Reviewed By: t.p.northover Subscribers: aemerson, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D34698 llvm-svn: 306478	2017-06-27 22:27:32 +00:00
Geoff Berry	2573a19fe6	[EarlyCSE][MemorySSA] Enable MemorySSA in function-simplification pass of EarlyCSE. llvm-svn: 306477	2017-06-27 22:25:02 +00:00
Aditya Nandakumar	cca75d2406	[GISel]: Add G_FEXP, G_FEXP2 opcodes Also add IRTranslator support. https://reviews.llvm.org/D34710 llvm-svn: 306475	2017-06-27 22:19:32 +00:00
Dehao Chen	920d022519	re-commit r306336: Enable vectorizer-maximize-bandwidth by default. Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306473	2017-06-27 22:05:58 +00:00
Sanjay Patel	70b36f193d	[CGP] eliminate a sub instruction in memcmp expansion As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471	2017-06-27 21:46:34 +00:00
Tim Northover	849fcca090	GlobalISel: verify that a COPY is trivial when created. Without this check, COPY instructions can actually be one of the generic casts in disguise. That's confusing and bad. At some point during ISel this restriction has to be relaxed since the fully selected instructions will usually use COPY for those purposes. Right now I think it's possible that relaxation occurs during RegBankSelect (hence the change there). I'm not convinced that's where it belongs long-term though. llvm-svn: 306470	2017-06-27 21:41:40 +00:00
Xinliang David Li	241f020657	Clean up a test case llvm-svn: 306468	2017-06-27 21:35:49 +00:00
Krzysztof Parzyszek	0b7688e6c0	Create a PHI value when merging with a known undef live-in Differential Revision: https://reviews.llvm.org/D34640 llvm-svn: 306466	2017-06-27 21:30:46 +00:00
Sam Clegg	3431c3e9be	[WebAssembly] Only run WebAssembly objdump tests if it is enabled as a target Differential Revision: https://reviews.llvm.org/D34712 llvm-svn: 306464	2017-06-27 21:19:27 +00:00
Sam Clegg	4df5d76477	[WebAssembly] Add support for printing relocations with llvm-objdump Differential Revision: https://reviews.llvm.org/D34658 llvm-svn: 306461	2017-06-27 20:40:53 +00:00
Sam Clegg	9e1ade93a8	[WebAssembly] Add data size and alignement to linking section The overal size of the data section (including BSS) is otherwise not included in the wasm binary. Differential Revision: https://reviews.llvm.org/D34657 llvm-svn: 306459	2017-06-27 20:27:59 +00:00
Krzysztof Parzyszek	25173e4cba	[Hexagon] Use proper predicate register state when expanding PS_vselect llvm-svn: 306458	2017-06-27 19:59:46 +00:00
Craig Topper	5fe0197622	[InstCombine] Propagate nsw flag when turning mul by pow2 into shift when the constant is a vector splat or the scalar bit width is larger than 64-bits The check to see if we can propagate the nsw flag used m_ConstantInt(uint64_t*&) which doesn't work with splat vectors and has a restriction that the bitwidth of the ConstantInt must be 64-bits are less. This patch changes it to use m_APInt to remove both these issues Differential Revision: https://reviews.llvm.org/D34699 llvm-svn: 306457	2017-06-27 19:57:53 +00:00
Stanislav Mekhanoshin	e8bf6c9629	[AMDGPU] Add 2 new alignbit patterns Differential Revision: https://reviews.llvm.org/D34655 llvm-svn: 306449	2017-06-27 19:10:47 +00:00
Serge Guelton	7bc405aa4c	[CodeExtractor] Prevent extraction of block involving blockaddress BlockAddress are only valid within their function context, which does not interact well with CodeExtractor. Detect this case and prevent it. Differential Revision: https://reviews.llvm.org/D33839 llvm-svn: 306448	2017-06-27 18:57:53 +00:00
Stanislav Mekhanoshin	c9bd53ab59	[AMDGPU] Simplify setcc (sext from i1 b), -1\|0, cc Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446	2017-06-27 18:53:03 +00:00
Krzysztof Parzyszek	5ddd2e5899	[Hexagon] Update kills in hexagon-nvj even more properly than before Account for the fact that both, the feeder and the compare can be moved over instructions that kill registers. llvm-svn: 306443	2017-06-27 18:37:16 +00:00
Matt Arsenault	836d786e86	RenameIndependentSubregs: Fix infinite loop Apparently this replacement can really be substituting the same as the original register. Avoid restarting the loop when there's been no change in the register uses. llvm-svn: 306441	2017-06-27 18:28:10 +00:00
Yaxun Liu	7c44f340de	[SROA] Fix APInt size when alloca address space is not 0 SROA assumes alloca address space is 0, which causes assertion. This patch fixes that. Differential Revision: https://reviews.llvm.org/D34104 llvm-svn: 306440	2017-06-27 18:26:06 +00:00
Stanislav Mekhanoshin	6851ddf942	[AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0 Also factored out function to check if a boolean is an already deserialized value which does not require v_cndmask_b32 to be loaded. Added binary logical operators to its check. Differential Revision: https://reviews.llvm.org/D34500 llvm-svn: 306439	2017-06-27 18:25:26 +00:00
Sanjay Patel	7227276d41	[InstCombine] canonicalize icmp predicate feeding select This canonicalization was suggested in D33172 as a way to make InstCombine behavior more uniform. We have this transform for icmp+br, so unless there's some reason that icmp+select should be treated differently, we should do the same thing here. The benefit comes from increasing the chances of creating identical instructions. This is shown in the tests in logical-select.ll (PR32791). InstCombine doesn't fold those directly, but EarlyCSE can simplify the identical cmps, and then InstCombine can fold the selects together. The possible regression for the tests in select.ll raises questions about poison/undef: http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html ...but that transform is just as likely to be triggered by this canonicalization as it is to be missed, so we're just pointing out a commutation deficiency in the pattern matching: https://reviews.llvm.org/rL228409 Differential Revision: https://reviews.llvm.org/D34242 llvm-svn: 306435	2017-06-27 17:53:22 +00:00
Craig Topper	9512332bcb	[InstCombine] Add test case demonstrating that we don't propagate nsw flag when converting mul by pow2 to shl when the type is larger than 64-bits. NFC llvm-svn: 306427	2017-06-27 17:16:03 +00:00
Craig Topper	d068fb8104	[InstCombine] Add test cases to show that we don't propagate 'nsw' flags when converting mul by pow2 constant to shl for splat vectors. NFC llvm-svn: 306426	2017-06-27 17:16:01 +00:00
Coby Tayree	41a5b55f50	[X86][AsmParser][MS-compatability] Binary/Unary operators enhancements Introducing MOD binary operator https://msdn.microsoft.com/en-us/library/hha180wt.aspx Enhancing unary operators NEG and NOT, to support more complex patterns Differential Revision: https://reviews.llvm.org/D33876 llvm-svn: 306425	2017-06-27 16:58:27 +00:00
Chih-Hung Hsieh	ff680f0386	Another test commit llvm-svn: 306420	2017-06-27 16:18:41 +00:00
Craig Topper	81cbb0c237	[PatternMatch] Remove 64-bit or less restriction from m_SpecificInt Not sure why this restriction existed, but it seems like we should support any size Constant here. The particular pattern in the tests is not the only use of this matcher in the tree. There's one in CodeGenPrepare and one in InstSimplify as well. Differential Revision: https://reviews.llvm.org/D34666 llvm-svn: 306417	2017-06-27 15:39:40 +00:00
Craig Topper	0a60d85811	[JumpThreading] Add test case that was supposed to go with r306085. Looks like I forgot to 'git add' when I submitted the commit. Thanks to Chandler for noticing. llvm-svn: 306416	2017-06-27 15:26:47 +00:00
Gadi Haber	13759a7ed6	Updated and extended the information about each instruction in HSW and SNB to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414	2017-06-27 15:05:13 +00:00
Sam Kolton	a179d25b99	[AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions Summary: 1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it. 2. There were several problems with support of VOPC instructions in SDWA peephole pass. Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye Differential Revision: https://reviews.llvm.org/D34626 llvm-svn: 306413	2017-06-27 15:02:23 +00:00
Matthew Simpson	0bd79f416a	[AArch64] Update successor probabilities after ccmp-conversion This patch modifies the conditional compares pass so that it keeps successor probabilities up-to-date after the conversion. Previously, successor probabilities were being normalized to a uniform distribution, even though they may have been heavily biased prior to the conversion (e.g., if one of the edges was the back edge of a loop). This loss of information affected passes later in the pipeline. Differential Revision: https://reviews.llvm.org/D34109 llvm-svn: 306412	2017-06-27 15:00:22 +00:00
Simon Dardis	4155c8f1f3	[mips] Add instruction aliases for ds(r\|l)l. Add the instruction aliases for ds(r\|l)l for the two operand alias of ds(r\|l)lv and the aliases ds(r\|l)l with the three register operands. llvm-svn: 306405	2017-06-27 13:35:17 +00:00
Hiroshi Inoue	84aafee4fb	[SelectionDAG] set dereferenceable flag in MergeConsecutiveStores to fix assetion failure When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities. This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable. Differential Revision: https://reviews.llvm.org/D34679 llvm-svn: 306404	2017-06-27 12:43:08 +00:00
Ayman Musa	721d97f7b8	Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371 [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402	2017-06-27 12:08:37 +00:00
Diana Picus	0e74a134f8	[ARM] GlobalISel: Support G_SELECT for pointers All we need to do is mark it as legal, otherwise it's just like s32. llvm-svn: 306390	2017-06-27 10:29:50 +00:00
Simon Pilgrim	71d8b67bea	[X86][AVX512] Regenerate avx512 arithmetic tests llvm-svn: 306389	2017-06-27 10:13:56 +00:00
Daniel Sanders	cc36dbf55d	[globalisel][tablegen] Add support for EXTRACT_SUBREG. Summary: After this patch, we finally have test cases that require multiple instruction emission. Depends on D33590 Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls Subscribers: javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D33596 llvm-svn: 306388	2017-06-27 10:11:39 +00:00
Diana Picus	7145d22f81	[ARM] GlobalISel: Support G_SELECT for i32 * Mark as legal for (s32, i1, s32, s32) * Map everything into GPRs * Select to two instructions: a CMP of the condition against 0, to set the flags, and a MOVCCr to select between the two inputs based on the flags that we've just set llvm-svn: 306382	2017-06-27 09:19:51 +00:00
Hiroshi Inoue	912eec378c	[PowerPC] fix incorrect processor name for -mcpu in a test case to surpress warnings. ppc970 should be 970 (or g5) llvm-svn: 306380	2017-06-27 08:35:35 +00:00
Chandler Carruth	3f81d8024c	[SROA] Fix PR32902 by more carefully propagating !nonnull metadata. This is based heavily on the work done ni D34285. I mostly wanted to do test cleanup for the author to save them some time, but I had a really hard time understanding why it was so hard to write better test cases for these issues. The problem is that because SROA does a second rewrite of the loads and because we don't propagate !nonnull for non-pointer loads, we first introduced invalid !nonnull metadata and then stripped it back off just in time to avoid most ways of this PR manifesting. Moving to the more careful utility only fixes this by changing the predicate to look at the new load's type rather than the target type. However, that does fix the bug, and the utility is much nicer including adding range metadata to model the nonnull property after a conversion to an integer. However, we have bigger problems because we don't actually propagate range metadata, and the utility to do this extracted from instcombine isn't really in good shape to do this currently. It only handles the case of copying range metadata from an integer load to a pointer load. It doesn't even handle the trivial cases of propagating from one integer load to another when they are the same width! This utility will need to be beefed up prior to using in this location to get the metadata to fully survive. And even then, we need to go and teach things to turn the range metadata into an assume the way we do with nonnull so that when we promote an integer we don't lose the information. All of this will require a new test case that looks kind-of like `preserve-nonnull.ll` does here but focuses on range metadata. It will also likely require more testing because it needs to correctly handle changes to the integer width, especially as SROA actively tries to change the integer width! Last but not least, I'm a little worried about hooking the range metadata up here because the instcombine logic for converting from a range metadata to a nonnull metadata node seems broken in the face of non-zero address spaces where null is not mapped to the integer `0`. So that probably needs to get fixed with test cases both in SROA and in instcombine to cover it. But this does extract the core PR fix from D34285 of preventing the !nonnull metadata from being propagated in a broken state just long enough to feed into promotion and crash value tracking. On D34285 there is some discussion of zero-extend handling because it isn't necessary. First, the new load size covers all of the non-undef (ie, possibly initialized) bits. This may even extend past the original alloca if loading those bits could produce valid data. The only way its valid for us to zero-extend an integer load in SROA is if the original code had a zero extend or those bits were undef. And we get to assume things like undef never satifies nonnull, so non undef bits can participate here. No need to special case the zero-extend handling, it just falls out correctly. The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to save a few rounds of trivial edits fixing style issues and test case formulation. Differental Revision: D34285 llvm-svn: 306379	2017-06-27 08:32:03 +00:00
Nicolai Haehnle	43cc6c4e0f	AMDGPU: M0 operands to spill/restore opcodes are dead Summary: With scalar stores, M0 is clobbered and therefore marked as implicitly defined. However, it is also dead. This fixes an assertion when the Greedy Register Allocator decides to optimize a spill/restore pair away again (via tryHintsRecoloring). Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33319 llvm-svn: 306375	2017-06-27 08:04:13 +00:00
Igor Breger	925f088bae	[GlobalISel][X86] Add fp32/62 legalizer, regbank-select, selection tests for G_FADD, G_FSUB, G_FMUL, G_FDIV. NFC. llvm-svn: 306370	2017-06-27 07:01:54 +00:00
Mikael Holmen	37b5120a9a	[Reassociate] Make sure EraseInst sets MadeChange Summary: EraseInst didn't report that it made IR changes through MadeChange. It is essential that changes to the IR are reported correctly, since for example ReassociatePass::run() will indicate that all analyses are preserved otherwise. And the CGPassManager determines if the CallGraph is up-to-date based on status from InstructionCombiningPass::runOnFunction(). Reviewers: craig.topper, rnk, davide Reviewed By: rnk, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34616 llvm-svn: 306368	2017-06-27 05:32:13 +00:00
Hiroshi Inoue	5102028f63	[PowerPC] set optimization level in SelectionDAGISel PowerPC backend does not pass the current optimization level to SelectionDAGISel and so SelectionDAGISel works with the default optimization level regardless of the current optimization level. This patch makes the PowerPC backend set the optimization level correctly. Differential Revision: https://reviews.llvm.org/D34615 llvm-svn: 306367	2017-06-27 04:52:17 +00:00
Craig Topper	f9319a78a5	[InstCombine] Add test cases demonstrating that we don't optmize select+cmp+cttz/ctlz when the bitwidth is larger than 64 bits. llvm-svn: 306365	2017-06-27 04:50:47 +00:00
Chandler Carruth	c691623585	[SROA] Further test cleanup and add a test for the actual propagation of the nonnull attribute distinct from rewriting it into an assume. llvm-svn: 306358	2017-06-27 03:08:45 +00:00
Chandler Carruth	3b978394ba	[SROA] Clean up a test case a bit prior to adding more testing for nonnull as part of fixing PR32902. llvm-svn: 306353	2017-06-27 02:23:15 +00:00
Matthias Braun	e2ae001982	ScheduleDAGInstrs: Fix fixupKills() adding too many kill flags. Remove invalid shortcut in fixupKills(): A register needs to be marked live even when we are not adding a kill flag. This is because a partially live register must not get a kill flags, but it still needs to be fully marked live when walking backwards. llvm-svn: 306352	2017-06-27 00:58:48 +00:00
Wolfgang Pieb	9f65858235	DAGCombine: Make sure we only eliminate trunc/extend when the scales of truncation and extension match. This fixes PR33368. Reviewer: rksimon Differential Revision: https://reviews.llvm.org/D34069 llvm-svn: 306345	2017-06-26 23:05:51 +00:00
Dehao Chen	8b7effb344	revert r306336 for breaking ppc test. llvm-svn: 306344	2017-06-26 23:05:35 +00:00
Sanjay Patel	b859910eb2	[x86] add tests for missing sbb transforms; NFC llvm-svn: 306337	2017-06-26 22:20:07 +00:00
Dehao Chen	79655792cc	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306336	2017-06-26 21:41:09 +00:00
Dehao Chen	38f1bc7834	Fix the bug when handling shufflevector for aarch64. Summary: This Fixes https://bugs.llvm.org/show_bug.cgi?id=33600 Reviewers: mssimpso, davidxl, Carrot Reviewed By: mssimpso Subscribers: aemerson, rengolin, sanjoy, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D34641 llvm-svn: 306334	2017-06-26 21:33:51 +00:00
Matt Arsenault	53fae0772a	RenameIndependentSubregs: Fix iterator problem Fixes bug 33597. Use of substituteRegister in the tied operand case messes up the register use iterator, causing some uses to be left unprocessed. llvm-svn: 306333	2017-06-26 21:33:36 +00:00
Sam Clegg	933df2658d	[WebAssembly] Add more support for weak symbols Add weak symbol tests to MC Add symbol flags to output of `llvm-readobj -t`. Differential Revision: https://reviews.llvm.org/D34635 llvm-svn: 306330	2017-06-26 21:01:39 +00:00
Tim Northover	c2d5e6d637	AArch64: legalize G_EXTRACT operations. This is the dual problem to legalizing G_INSERTs so most of the code and testing was cribbed from there. llvm-svn: 306328	2017-06-26 20:34:13 +00:00
Tim Northover	9ac3e42211	AArch64: remove all kill flags when extending register liveness. When we forward a stored value to a load and eliminate it entirely we need to make sure the liveness of the register is maintained all the way to its use. Previously we only cleared liveness on the store doing the forwarding, but there could be other killing uses in between. We already do the right thing when the load has to be converted into something else, it was just this one path that skipped it. llvm-svn: 306318	2017-06-26 18:49:25 +00:00
Simon Pilgrim	d58f051792	[X86][SSE] Check SSE2/SSE3 codegen tests on i686 and x86_64 llvm-svn: 306314	2017-06-26 18:20:46 +00:00
Wei Mi	71f06420e4	[GVN] Recommit the patch "Add phi-translate support in scalarpre". The recommit fixes three bugs: The first one is to use CurrentBlock instead of PREInstr's Parent as param of performScalarPREInsertion because the Parent of a clone instruction may be uninitialized. The second one is stop PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst is defined inside of CurrentBlock. The same value defined inside of loop in last iteration can not be regarded as available. The third one is an out-of-bound array access in a flipped if guard. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. llvm-svn: 306313	2017-06-26 18:16:10 +00:00
Matt Arsenault	f28683cf51	AMDGPU: Setup SP/FP in callee function prolog/epilog llvm-svn: 306312	2017-06-26 17:53:59 +00:00
Zachary Turner	e79b07e41e	[llvm-pdbutil] Add a mode to `bytes` for dumping split debug chunks. llvm-svn: 306309	2017-06-26 17:22:36 +00:00
Ulrich Weigand	af98b748f6	[SystemZ] Fix missing emergency spill slot corner case We sometimes need emergency spill slots for the register scavenger. This may be the case when code needs to access a stack slot that has an offset of 4096 or more relative to the stack pointer. To make that determination, processFunctionBeforeFrameFinalized currently simply checks the total stack frame size of the current function. But this is not enough, since code may need to access stack slots in the caller's stack frame as well, in particular incoming arguments stored on the stack. This commit fixes the problem by taking argument slots into account. llvm-svn: 306305	2017-06-26 16:50:32 +00:00
Simon Pilgrim	f07663876a	[X86][SSE] Add combine tests for PMULDQ/PMULUDQ Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations llvm-svn: 306302	2017-06-26 16:22:52 +00:00
Ahmed Bougacha	58a197414e	[X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc. The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299	2017-06-26 16:00:24 +00:00
Tom Stellard	eb8f1e27d9	AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34589 llvm-svn: 306298	2017-06-26 15:56:52 +00:00
Simon Pilgrim	0ad0e5802b	[X86] Add test case for PR15981 llvm-svn: 306296	2017-06-26 15:53:11 +00:00
Sanjay Patel	15748d239e	[x86] transform vector inc/dec to use -1 constant (PR33483) Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289	2017-06-26 14:19:26 +00:00
Krzysztof Parzyszek	918e6d70bd	[Hexagon] Handle cases when the aligned stack pointer is missing llvm-svn: 306288	2017-06-26 14:17:58 +00:00
Jonas Paulsson	8c33647ba1	[SystemZ] Add a check against zero before calling getTestUnderMaskCond() Csmith discovered that this function can be called with a zero argument, in which case an assert for this triggered. This patch also adds a guard before the other call to this function since it was missing, although the test only covers the case where it was discovered. Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll. Review: Ulrich Weigand llvm-svn: 306287	2017-06-26 13:38:27 +00:00
Michael Zuckerman	ce7e187f84	[X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306286	2017-06-26 13:27:32 +00:00
Serguei Katkov	0e70206c8f	This reverts commit r306272. Revert "[MBP] do not rotate loop if it creates extra branch" It breaks the sanitizer build bots. Need to fix this. llvm-svn: 306276	2017-06-26 06:51:45 +00:00
Hiroshi Inoue	4484ff03df	fix trivial typo in comment, NFC llvm-svn: 306274	2017-06-26 06:32:04 +00:00
Serguei Katkov	b01fff06ed	[MBP] do not rotate loop if it creates extra branch This is a last fix for the corner case of PR32214. Actually this is not really corner case in general. We should not do a loop rotation if we create an additional branch due to it. Consider the case where we have a loop chain H, M, B, C , where H is header with viable fallthrough from pre-header and exit from the loop M - some middle block B - backedge to Header but with exit from the loop also. C - some cold block of the loop. Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch. Let's compute the change in number of branches: +1 branch from pre-header to header -1 branch from header to exit +1 branch from header to middle block if there is such -1 branch from cold bock to header if there is one So if C is not a predecessor of H then we introduce extra branch. This change actually prohibits rotation of the loop if both true 1) Best Exit has next element in chain as successor. 2) Last element in chain is not a predecessor of first element of chain. Reviewers: iteratee, xur Reviewed By: iteratee Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34271 llvm-svn: 306272	2017-06-26 05:27:27 +00:00
Matt Arsenault	10fc062b2b	AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264	2017-06-26 03:01:31 +00:00
Chandler Carruth	4a000883c7	[LoopSimplify] Re-instate r306081 with a bug fix w.r.t. indirectbr. This was reverted in r306252, but I already had the bug fixed and was just trying to form a test case. The original commit factored the logic for forming dedicated exits inside of LoopSimplify into a helper that could be used elsewhere and with an approach that required fewer intermediate data structures. See that commit for full details including the change to the statistic, etc. The code looked fine to me and my reviewers, but in fact didn't handle indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty. If you have code that looks just right, you can end up leaking these predecessors into a subsequent rewrite, and crash deep down when trying to update PHI nodes for predecessors that don't exist. I've added an assert that makes the bug much more obvious, and then changed the code to reliably clear the vector so we don't get this bug again in some other form as the code changes. I've also added a test case that does manage to catch this while also giving some nice positive coverage in the face of indirectbr. The real code that found this came out of what I think is CPython's interpreter loop, but any code with really "creative" interpreter loops mixing indirectbr and other exit paths could manage to tickle the bug. I was hard to reduce the original test case because in addition to having a particular pattern of IR, the whole thing depends on the order of the predecessors which is in turn depends on use list order. The test case added here was designed so that in multiple different predecessor orderings it should always end up going down the same path and tripping the same bug. I hope. At least, it tripped it for me without manipulating the use list order which is better than anything bugpoint could do... llvm-svn: 306257	2017-06-25 22:45:31 +00:00
Chandler Carruth	73367b6a09	[LoopSimplify] Improve a test for loop simplify minorly. NFC. I did some basic testing while looking for a bug in my recent change to loop simplify and even though it didn't find the bug it seems like a useful improvement anyways. llvm-svn: 306256	2017-06-25 22:24:02 +00:00
Daniel Jasper	4c6cd4ccb7	Revert "[LoopSimplify] Factor the logic to form dedicated exits into a utility." This leads to a segfault. Chandler already has a test case and should be able to recommit with a fix soon. llvm-svn: 306252	2017-06-25 17:58:25 +00:00
Simon Pilgrim	9956364a1f	[X86] Add test case for PR15705 llvm-svn: 306246	2017-06-25 16:12:45 +00:00
Sanjay Patel	2f3ead7adc	[InstCombine] add (sext i1 X), 1 --> zext (not X) http://rise4fun.com/Alive/i8Q A narrow bitwise logic op is obviously better than math for value tracking, and zext is better than sext. Typically, the 'not' will be folded into an icmp predicate. The IR difference would even survive through codegen for x86, so we would see worse code: https://godbolt.org/g/C14HMF one_or_zero(int, int): # @one_or_zero(int, int) xorl %eax, %eax cmpl %esi, %edi setle %al retq one_or_zero_alt(int, int): # @one_or_zero_alt(int, int) xorl %ecx, %ecx cmpl %esi, %edi setg %cl movl $1, %eax subl %ecx, %eax retq llvm-svn: 306243	2017-06-25 14:15:28 +00:00
Elena Demikhovsky	72f991cded	AVX-512: Fixed a crash during legalization of <3 x i8> type The compiler fails with assertion during legalization of SETCC for <3 x i8> operands. The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>. Differential Revision: https://reviews.llvm.org/D34503 llvm-svn: 306242	2017-06-25 13:36:20 +00:00
Igor Breger	f5035d6ee5	[GlobalISel][X86] Support vector type G_EXTRACT selection. Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240	2017-06-25 11:42:17 +00:00
Dorit Nuzman	e0e0f1ddb0	[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2 The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238	2017-06-25 08:26:25 +00:00
Xinliang David Li	b67530e9b9	[PGO] Implementate profile counter regiser promotion Differential Revision: http://reviews.llvm.org/D34085 llvm-svn: 306231	2017-06-25 00:26:43 +00:00
Hiroshi Inoue	a85d24b73d	fix trivial typos in comment, NFC llvm-svn: 306211	2017-06-24 16:00:26 +00:00
Hiroshi Inoue	b300824ee7	fix trivial typos in comment, NFC dereferencable -> dereferenceable llvm-svn: 306210	2017-06-24 15:43:33 +00:00
Hiroshi Inoue	95f24dca98	[SelectionDAG] set dereferenceable flag when expanding memcpy/memmove When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable. This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer. This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure. Differential Revision: https://reviews.llvm.org/D34467 llvm-svn: 306209	2017-06-24 15:17:38 +00:00
Rafael Espindola	b05f4a7b25	Add missing %s to RUN line. llvm-svn: 306199	2017-06-24 04:41:39 +00:00
Rafael Espindola	2c166857b3	Test the object file creation too. This should really be a llvm-mc test, but the parser is broken. See PR33579 for the parser bug. llvm-svn: 306198	2017-06-24 04:31:45 +00:00
Vitaly Buka	df19ad456e	[InstCombine] Don't replace allocas with smaller globals Summary: InstCombine replaces large allocas with small globals consts causing buffer overflows on valid code, see PR33372. This fix permits this optimization only if the global is dereference for alloca size. Fixes PR33372 Reviewers: eugenis, majnemer, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34311 llvm-svn: 306194	2017-06-24 01:35:19 +00:00
Nirav Dave	18c10c53d0	Update constants in complex-return test to prevent reduction to smaller constants llvm-svn: 306192	2017-06-24 01:29:24 +00:00
Zachary Turner	fa33282774	[llvm-pdbutil] Dump raw bytes of module symbols and debug chunks. llvm-svn: 306179	2017-06-23 23:08:57 +00:00
Rafael Espindola	801b42de31	ARM: move some logic from processFixupValue to applyFixup. processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177	2017-06-23 22:52:36 +00:00
Petar Jovanovic	53dbfb3798	Reland r306095: [mips] Fix reg positions in the aui/daui instructions After fixing (r306173) a failing test in the lld test suite (r306173), reland r306095. Original commit message: [mips] Fix register positions in the aui/daui instructions Swapped the position of the rt and rs register in the aui/daui instructions for mips32r6 and mips64r6. With this change, the format of the generated instructions complies with specifications and GCC. Patch by Milos Stojanovic. llvm-svn: 306174	2017-06-23 22:37:19 +00:00

... 3 4 5 6 7 ...

46048 Commits