llvm-project

Commit Graph

Author	SHA1	Message	Date
Francis Visoiu Mistrih	52042aa21e	[PEI] Add basic opt-remarks support Add optimization remarks support to the PrologueEpilogueInserter. For now, emit the stack size as an analysis remark, but more additions wrt shrink-wrapping may be added. https://reviews.llvm.org/D35645 llvm-svn: 308556	2017-07-19 23:47:32 +00:00
Tim Northover	0e0b3c97dd	GlobalISel: fix SUBREG_TO_REG implementation. The first argument needs to be an immediate rather than a register. Should fix some crashes in the verifier bot. llvm-svn: 308540	2017-07-19 22:08:08 +00:00
Wolfgang Pieb	3610942c12	Forgot to add triple to test in r308513. llvm-svn: 308527	2017-07-19 21:45:21 +00:00
Wolfgang Pieb	e018bbd835	Fixing an issue with the initialization of LexicalScopes objects when mixing debug and non-debug units. Patch by Andrea DiBiagio. Differential Revision: https://reviews.llvm.org/D35637 llvm-svn: 308513	2017-07-19 19:36:40 +00:00
Krzysztof Parzyszek	ac01994db9	[Hexagon] Fix a bug in r308502: post-inc offset is always 0 llvm-svn: 308510	2017-07-19 19:17:32 +00:00
Davide Italiano	5fc5d0a406	[X86] Don't try to scale down if that exceeds the bitwidth. Fixes the crash reported in PR33844. llvm-svn: 308503	2017-07-19 18:09:46 +00:00
Tim Northover	d59fbec8e2	GlobalISel: select G_EXTRACT and G_INSERT instructions on AArch64. llvm-svn: 308493	2017-07-19 16:47:07 +00:00
Javed Absar	2cb0c95031	[ARM] Unify handling of M-Class system registers This patch cleans up and fixes issues in the M-Class system register handling: 1. It defines the system registers and the encoding (SYSm values) in one place: a new ARMSystemRegister.td using SearchableTable, thereby removing the hand-coded values which existed in multiple places. 2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed! Ref: ARMv6/7/8M architecture reference manual. Reviewed by: @t.p.northover, @olist01, @john.brawn Differential Revision: https://reviews.llvm.org/D35209 llvm-svn: 308456	2017-07-19 12:57:16 +00:00
Simon Pilgrim	e5c7925c5e	[X86][XOP] Use default AVX2 lowering for v4i64 ashr by splat constants XOP shifts only support 128-bit vectors, so we were ending up with less optimal codegen requiring constants llvm-svn: 308430	2017-07-19 10:29:31 +00:00
Balaram Makam	b05a55787a	[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure. Summary: When simplifying unconditional branches from empty blocks, we pre-test if the BB belongs to a set of loop headers and keep the block to prevent passes from destroying canonical loop structure. However, the current algorithm fails if the destination of the branch is a loop header. Especially when such a loop's latch block is folded into loop header it results in additional backedges and LoopSimplify turns it into a nested loop which prevent later optimizations from being applied (e.g., loop unrolling and loop interleaving). This patch augments the existing algorithm by further checking if the destination of the branch belongs to a set of loop headers and defer eliminating it if yes to LateSimplifyCFG. Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605 Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: efriedma Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35411 llvm-svn: 308422	2017-07-19 08:53:34 +00:00
Chandler Carruth	bb83558f00	Revert r308273 to reinstate part of r308100. That part was reverted because the underlying change necessitating it (r308025) was reverted in r308271. Nirav re-landed r308025 again in r308350, so re-landing this fix. llvm-svn: 308418	2017-07-19 04:15:30 +00:00
Craig Topper	106b5b6856	AMD znver1 Initial Scheduler model Summary: This patch adds the following 1. Adds a skeleton scheduler model for AMD Znver1. 2. Introduces the znver1 execution units and pipes. 3. Caters the instructions based on the generic scheduler classes. 4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on a. Instructions types b. Registers used 5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added. 6. Scheduler testcases are modified accordingly to suit the new model. Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me. Reviewers: craig.topper, RKSimon Subscribers: javed.absar, shivaram, ddibyend, vprasad Differential Revision: https://reviews.llvm.org/D35293 llvm-svn: 308411	2017-07-19 02:45:14 +00:00
Mandeep Singh Grang	d857b4ca98	[COFF, ARM64] Reserve X18 register by default Reviewers: compnerd, rnk, ruiu, mstorsjo Reviewed By: mstorsjo Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35531 llvm-svn: 308358	2017-07-18 20:41:33 +00:00
Nirav Dave	d839749ae8	[DAG] Improve Aliasing of operations to static alloca Re-recommiting after landing DAG extension-crash fix. Recommiting after adding check to avoid miscomputing alias information on addresses of the same base but different subindices. Memory accesses offset from frame indices may alias, e.g., we may merge write from function arguments passed on the stack when they are contiguous. As a result, when checking aliasing, we consider the underlying frame index's offset from the stack pointer. Static allocs are realized as stack objects in SelectionDAG, but its offset is not set until post-DAG causing DAGCombiner's alias check to consider access to static allocas to frequently alias. Modify isAlias to consider access between static allocas and access from other frame objects to be considered aliasing. Many test changes are included here. Most are fixes for tests which indirectly relied on our aliasing ability and needed to be modified to preserve their original intent. The remaining tests have minor improvements due to relaxed ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll which has a minor degradation dispite though the pre-legalized DAG is improved. Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand Reviewed By: rnk Subscribers: sdardis, nemanjai, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33345 llvm-svn: 308350	2017-07-18 20:06:24 +00:00
James Y Knight	dda87cab7d	[Sparc] Added software multiplication/division feature Added a feature to the Sparc back-end that replaces the integer multiply and divide instructions with calls to .mul/.sdiv/.udiv. This is a step towards having full v7 support. Patch by: Eric Kedaigle Differential Revision: https://reviews.llvm.org/D35500 llvm-svn: 308343	2017-07-18 19:08:38 +00:00
Nirav Dave	07871007aa	[DAG] Avoid deleting nodes before combining them. When replacing a node and it's operand, replacing the operand node may cause the deletion of the original node leading to an assertion failure. Case around these replacements to avoid this without relying on inspecting the DELETED_NODE opcode in various extend dagcombiner cases. Fixes PR32515. Reviewers: dbabokin, RKSimon, davide, chandlerc Subscribers: chandlerc, llvm-commits Differential Revision: https://reviews.llvm.org/D34095 llvm-svn: 308330	2017-07-18 17:39:15 +00:00
Matt Arsenault	254ad3de5c	AMDGPU: Annotate necessity of flat-scratch-init As an approximation of the existing handling to avoid regressions. Fixes using too many registers with calls on subtargets with the SGPR allocation bug. llvm-svn: 308326	2017-07-18 16:44:58 +00:00
Matt Arsenault	1cc47f8413	AMDGPU: Figure out private memory regs after lowering Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325	2017-07-18 16:44:56 +00:00
Geoff Berry	9962faed2b	[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 2) Summary: Avoid HW prefetcher instruction tag collisions in loops by inserting MOVs to change the base address register of strided loads. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D35366 llvm-svn: 308324	2017-07-18 16:14:22 +00:00
Simon Pilgrim	964a1f1fb0	[X86][AVX] Regenerate shift test to show constant broadcast comment llvm-svn: 308323	2017-07-18 16:07:12 +00:00
Simon Pilgrim	483927aefb	[x86, CGP] increase memcmp() expansion up to 4 load pairs It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322	2017-07-18 15:55:30 +00:00
Sumanth Gundapaneni	d5aa0f3464	[Hexagon] Emit lookup tables in text section based on a flag The flag "-hexagon-emit-lut-text" (defaulted to false) is added to decide on where to keep the switch generated lookup table. Differential Revision: https://reviews.llvm.org/D34818 llvm-svn: 308316	2017-07-18 15:31:37 +00:00
Nicolai Haehnle	a253e4c028	AMDGPU: Fix crash when folding immediates into multiple uses Summary: When an immediate is folded by constant folding, we re-scan the entire use list for two reasons: 1. The constant folding may have created a new use of the same reg. 2. The constant folding may have removed an additional use in the list we're currently traversing (e.g., constant folding an S_ADD_I32 c, c). However, this could previously lead to a crash when an unrelated use was added twice into the FoldList. Since we re-scan the whole list anyway, we might as well just clear the FoldList again before we do so. Using a MIR test to show this because real code seems to trigger the issue only in connection with some really subtle control flow structures. Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35416 llvm-svn: 308314	2017-07-18 14:54:41 +00:00
Simon Pilgrim	c2cbb525ec	[X86] Add optsize and minsize memcmp tests (D35067) llvm-svn: 308311	2017-07-18 14:26:07 +00:00
Sam Kolton	4685b70a77	[AMDGPU] resubmit r308179: CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions llvm-svn: 308310	2017-07-18 14:23:26 +00:00
Simon Pilgrim	420e5eadc2	[X86] Added cmov target to memcmp test As discussed by @spatel on D35067: "I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)." llvm-svn: 308309	2017-07-18 14:19:34 +00:00
Daniel Sanders	40b66d646e	[globalisel][tablegen] Enable the import of rules involving fma. Summary: G_FMA was recently added to GlobalISel which enables the import of rules involving fma. Add the mapping to allow it. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35130 llvm-svn: 308308	2017-07-18 14:10:07 +00:00
Simon Pilgrim	4793a11df9	[DAGCombine] Fix issue with out of bound constant rotation (PR33828) Take the modulo of rotations by a constant greater than or equal to the bit-width llvm-svn: 308302	2017-07-18 12:31:46 +00:00
Stefan Maksimovic	58f225b371	[mips] Alter register classes for MSA pseudo f16 instructions This change introduces additional machine instructions in functions dealing with the expansion of msa pseudo f16 instructions due to register classes being inappropriate when checked with machine verifier. Differential Revision: https://reviews.llvm.org/D34276 llvm-svn: 308301	2017-07-18 12:05:35 +00:00
Simon Pilgrim	0636fbd737	[X86][AVX512] Add ISD::ROTL/ISD::ROTR constant folding tests llvm-svn: 308295	2017-07-18 11:18:38 +00:00
Simon Pilgrim	8d0fc91adc	[X86] Add test case for PR32282 llvm-svn: 308286	2017-07-18 10:09:40 +00:00
Diana Picus	da25d5b8b0	[ARM] GlobalISel: Support G_(S\|U)REM for s8 and s16 Widen to s32, and then do whatever Lowering/Custom/Libcall action the subtarget wants. llvm-svn: 308285	2017-07-18 10:07:01 +00:00
Florian Hahn	3530094de6	[AArch64] Use 16 bytes as preferred function alignment on Cortex-A73. Summary: Using 16 byte alignment is beneficial on Cortex-A73, similar to Cortex-A72 (added in D34961). Reviewers: mcrosier, t.p.northover, aadg, silviu.baranga Reviewed By: t.p.northover Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35493 llvm-svn: 308283	2017-07-18 09:31:18 +00:00
Chandler Carruth	3a9968184a	Revert part of r308100 since the cause (r308025) was also reverted. The commit r308100 updated WebAssembly tests for r308025. In one case it merely made the test more resilient but in another case it made a substantive update. Because r308025 was reverted in r308271, these changes to the test also need to be reverted. They should be folded into the recommit of r308025 when it is ready. llvm-svn: 308273	2017-07-18 08:20:50 +00:00
Chandler Carruth	0781d52cb3	[x86] Add a missing triple, without which the CPU won't parse. Notably, this is failing on our PPC build bots: http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/8338/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Apr33772.ll llvm-svn: 308272	2017-07-18 08:16:32 +00:00
Chandler Carruth	a15e080b05	Revert r308025 due to uncovering a crash in SelectionDAG. This is filed with a minimal test case in http://llvm.org/PR33833. Original commit message: Improve Aliasing of operations to static alloca llvm-svn: 308271	2017-07-18 07:53:47 +00:00
Chandler Carruth	9a7442d088	Revert r308179 which causes tablegen to spam stderr on every build. Original commit log: [AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions llvm-svn: 308270	2017-07-18 07:40:47 +00:00
Craig Topper	f54a500101	[X86] Prevent an assertion failure if a gather intrinsic is passed a non-constant scale value. This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure. Fixes PR33772. llvm-svn: 308267	2017-07-18 06:49:23 +00:00
Matt Arsenault	e15855d9e3	AMDGPU: Annotate features from x work item/group IDs. This wasn't necessary before since they are always enabled for kernels, but this is necessary if they need to be forwarded to a callable function. llvm-svn: 308226	2017-07-17 22:35:50 +00:00
Martin Storsjo	2f24e93481	[AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well Rename the enum value from X86_64_Win64 to plain Win64. The symbol exposed in the textual IR is changed from 'x86_64_win64cc' to 'win64cc', but the numeric value is kept, keeping support for old bitcode. Differential Revision: https://reviews.llvm.org/D34474 llvm-svn: 308208	2017-07-17 20:05:19 +00:00
Ulrich Weigand	f2968d58cb	[SystemZ] Add support for IBM z14 processor (3/3) This adds support for the new 128-bit vector float instructions of z14. Note that these instructions actually only operate on the f128 type, since only each 128-bit vector register can hold only one 128-bit float value. However, this is still preferable to the legacy 128-bit float instructions, since those operate on pairs of floating-point registers (so we can hold at most 8 values in registers), while the new instructions use single vector registers (so we hold up to 32 value in registers). Adding support includes: - Enabling the instructions for the assembler/disassembler. - CodeGen for the instructions. This includes allocating the f128 type now to the VR128BitRegClass instead of FP128BitRegClass. - Scheduler description support for the instructions. Note that for a small number of operations, we have no new vector instructions (like integer <-> 128-bit float conversions), and so we use the legacy instruction and then reformat the operand (i.e. copy between a pair of floating-point registers and a vector register). llvm-svn: 308196	2017-07-17 17:44:20 +00:00
Ulrich Weigand	33435c4c9c	[SystemZ] Add support for IBM z14 processor (2/3) This adds support for the new 32-bit vector float instructions of z14. This includes: - Enabling the instructions for the assembler/disassembler. - CodeGen for the instructions, including new LLVM intrinsics. - Scheduler description support for the instructions. - Update to the vector cost function calculations. In general, CodeGen support for the new v4f32 instructions closely matches support for the existing v2f64 instructions. llvm-svn: 308195	2017-07-17 17:42:48 +00:00
Ulrich Weigand	2b3482fe85	[SystemZ] Add support for IBM z14 processor (1/3) This patch series adds support for the IBM z14 processor. This part includes: - Basic support for the new processor and its features. - Support for new instructions (except vector 32-bit float and 128-bit float). - CodeGen for new instructions, including new LLVM intrinsics. - Scheduler description for the new processor. - Detection of z14 as host processor. Support for the new 32-bit vector float and 128-bit vector float instructions is provided by separate patches. llvm-svn: 308194	2017-07-17 17:41:11 +00:00
Mandeep Singh Grang	ed64963f1e	[llvm] Remove redundant check-prefix=CHECK from tests. NFC. Reviewers: t.p.northover, oren_ben_simhon, niravd, mcrosier Reviewed By: oren_ben_simhon, mcrosier Subscribers: nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35466 llvm-svn: 308193	2017-07-17 17:32:45 +00:00
Krzysztof Parzyszek	5eef92eb7f	[Hexagon] Remove custom lowering of loads of v4i16 The target-independent lowering works fine, except concatenating 32-bit words. Add a pattern to generate A2_combinew instead of 64-bit asl/or. llvm-svn: 308186	2017-07-17 15:45:45 +00:00
Simon Pilgrim	948eca371e	[X86] Add LEA scheduling tests llvm-svn: 308180	2017-07-17 14:37:17 +00:00
Sam Kolton	a2b9e2f755	[AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions Summary: Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod. Changed .td files to check if dst operand instead of src operand. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D35350 llvm-svn: 308179	2017-07-17 14:23:38 +00:00
Simon Pilgrim	1cbe8c2ca5	[X86][AVX512] Add lowering of vXi32/vXi64 ISD::ROTL/ISD::ROTR Add support for lowering to ISD::ROTL/ISD::ROTR, including rotate by immediate Differential Revision: https://reviews.llvm.org/D35463 llvm-svn: 308177	2017-07-17 14:11:30 +00:00
Simon Pilgrim	105a3716bb	Fixed line endings. NFCI. llvm-svn: 308175	2017-07-17 13:58:20 +00:00
Simon Pilgrim	11199b2ee5	[X86][AVX] Fix typo in vector rotate tests Was preventing rotate matching llvm-svn: 308171	2017-07-17 10:35:51 +00:00
Simon Pilgrim	5aa70e7fe5	[X86][AVX512] Add constant splat vector rotate tests for D35463 llvm-svn: 308169	2017-07-17 10:09:48 +00:00
Simon Pilgrim	701e25edce	[X86][AVX512] Regenerate shift tests llvm-svn: 308168	2017-07-17 09:53:45 +00:00
Dylan McKay	5c8a50bddd	[AVR] Add/remove XFAILs to get the backend passing Generic CodeGen tests A few tests have since been fixed, and a few since now fail. llvm-svn: 308151	2017-07-16 23:33:50 +00:00
Andrew Zhogin	67a64041b9	[DAGCombiner] Recognise vector rotations with non-splat constants Fixes PR33691. Differential revision: https://reviews.llvm.org/D35381 llvm-svn: 308150	2017-07-16 23:11:45 +00:00
Dylan McKay	2c59215ae3	[AVR] Fix a typo in the tests llvm-svn: 308148	2017-07-16 22:31:07 +00:00
Konstantin Zhuravlyov	2ec725c9d8	AMDGPU: Fix amdgpu-flat-work-group-size/amdgpu-waves-per-eu check Differential Revision: https://reviews.llvm.org/D35433 llvm-svn: 308147	2017-07-16 19:38:47 +00:00
Simon Pilgrim	2899ec88fc	[X86][AVX512] Add 512-bit vector rotate tests llvm-svn: 308146	2017-07-16 19:26:49 +00:00
Amjad Aboud	4563c062b1	[X86] X86::CMOV to Branch heuristic based optimization. LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst. However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when: 1. Branch is well predicted 2. Condition operand is expensive, compared to True-value and the False-value operands In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough. This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic. Differential Revision: https://reviews.llvm.org/D34769 llvm-svn: 308142	2017-07-16 17:39:56 +00:00
Simon Pilgrim	dad2aef037	[X86] Add F16C scheduling tests llvm-svn: 308138	2017-07-16 14:34:18 +00:00
Simon Pilgrim	6f26f3d07f	[X86] Add POPCNT scheduling tests llvm-svn: 308137	2017-07-16 14:22:39 +00:00
Simon Pilgrim	b884b208ee	[X86] Add BMI2 scheduling tests llvm-svn: 308136	2017-07-16 14:09:15 +00:00
Simon Pilgrim	dfb6eb279f	[X86] Add BMI1 scheduling tests llvm-svn: 308135	2017-07-16 13:59:44 +00:00
Simon Pilgrim	7194513268	[X86] Add LZCNT scheduling tests llvm-svn: 308133	2017-07-16 13:40:44 +00:00
Simon Pilgrim	73ef87978f	[X86][SSE4A] Add EXTRQ/INSERTQ values to BTVER2 scheduling model llvm-svn: 308132	2017-07-16 12:06:06 +00:00
Simon Pilgrim	7d43bcfd2d	[X86][AVX] Regenerate tests with constant broadcast comments llvm-svn: 308131	2017-07-16 11:43:16 +00:00
Simon Pilgrim	e47df64a18	[X86][AVX] Regenerate vector tzcnt tests with constant broadcast comments llvm-svn: 308130	2017-07-16 11:40:23 +00:00
Simon Pilgrim	17f20f48c2	[X86][AVX] Regenerate vector idiv tests with constant broadcast comments llvm-svn: 308129	2017-07-16 11:38:14 +00:00
Simon Pilgrim	77ce072f6b	[X86][AVX] Regenerate combine tests with constant broadcast comments llvm-svn: 308128	2017-07-16 11:36:11 +00:00
Hiroshi Inoue	7f46baff2c	fix typos in comments; NFC llvm-svn: 308127	2017-07-16 08:11:56 +00:00
Simon Pilgrim	f9ea0959d9	[X86][AVX] Regenerate tests with constant broadcast comments llvm-svn: 308110	2017-07-15 21:17:35 +00:00
Simon Pilgrim	c2221ee767	[X86][AVX] Regenerate tests with constant broadcast comments llvm-svn: 308109	2017-07-15 20:28:09 +00:00
Chandler Carruth	85c82841ba	[wasm] Update two tests for r308025 which causes scheduling changes due to the newly improved AA information. llvm-svn: 308100	2017-07-15 15:44:36 +00:00
Simon Atanasyan	f217c7b7e2	[mips] Handle the `long-calls` feature flags in the MIPS backend If the `long-calls` feature flags is enabled, disable use of the `jal` instruction. Instead of that call a function by by first loading its address into a register, and then using the contents of that register. Differential revision: https://reviews.llvm.org/D35168 llvm-svn: 308087	2017-07-15 07:14:25 +00:00
Matt Arsenault	b34635550a	AMDGPU: Return correct type during argument lowering The type needs to be casted back to the original argument type. Fixes an assert that for some reason is only run when using -debug. Includes an additional combine to avoid test regressions from having conversions mixed with multiple Assert[SZ]ext nodes. On subtargets where i16 is legal, this was producing an i32 register with an i16 AssertZExt, truncated to i16 with another i8 AssertZExt. t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t3: i16 = truncate t2 t5: i16 = AssertZext t3, ValueType:ch:i8 t6: i8 = truncate t5 t7: i32 = zero_extend t6 llvm-svn: 308082	2017-07-15 05:52:59 +00:00
Yonghong Song	9276ef05c8	bpf: generate better lowering code for certain select/setcc instructions Currently, for code like below, === inner_map = bpf_map_lookup_elem(outer_map, &port_key); if (!inner_map) { inner_map = &fallback_map; } === the compiler generates (pseudo) code like the below: === I1: r1 = bpf_map_lookup_elem(outer_map, &port_key); I2: r2 = 0 I3: if (r1 == r2) I4: r6 = &fallback_map I5: ... === During kernel verification process, After I1, r1 holds a state map_ptr_or_null. If I3 condition is not taken (path [I1, I2, I3, I5]), supposedly r1 should become map_ptr. Unfortunately, kernel does not recognize this pattern and r1 remains map_ptr_or_null at insn I5. This will cause verificaiton failure later on. Kernel, however, is able to recognize pattern "if (r1 == 0)" properly and give a map_ptr state to r1 in the above case. LLVM here generates suboptimal code which causes kernel verification failure. This patch fixes the issue by changing BPF insn pattern matching and lowering to generate proper codes if the righthand parameter of the above condition is a constant. A test case is also added. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 308080	2017-07-15 05:41:42 +00:00
Yi Kong	3b680d8d81	[AArch64] Avoid selecting XZR inline ASM memory operand Restricting register class to PointerRegClass for memory operands. Also fix the PointerRegClass for AArch64 from GPR64 to GPR64sp, since XZR cannot hold a memory pointer while SP is. Fixes PR33134. Differential Revision: https://reviews.llvm.org/D34999 llvm-svn: 308060	2017-07-14 21:46:16 +00:00
Geoff Berry	b1e8714af9	[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1) Summary: This patch is the first step in reducing HW prefetcher instruction tag collisions in inner loops for Falkor. It adds a pass that annotates IR loads with metadata to indicate that they are known to be strided loads, and adds a target lowering hook that translates this metadata to a target-specific MachineMemOperand flag. A follow on change will use this MachineMemOperand flag to re-write instructions to reduce tag collisions. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34963 llvm-svn: 308059	2017-07-14 21:44:12 +00:00
Alfred Huang	5b27072f57	[AMDGPU] Do not insert an instruction into worklist twice in movetovalu In moveToVALU(), move to vector ALU is performed, all instrs in the use chain will be visited. We do not want the same node to be pushed to the visit worklist more than once. Differential Revision: https://reviews.llvm.org/D34726 llvm-svn: 308039	2017-07-14 17:56:55 +00:00
Krzysztof Parzyszek	9c084fc55d	[Hexagon] Add intrinsics for data cache operations This is the LLVM part, adding definitions for void @llvm.hexagon.Y2.dccleana(i8) void @llvm.hexagon.Y2.dccleaninva(i8) void @llvm.hexagon.Y2.dcinva(i8) void @llvm.hexagon.Y2.dczeroa(i8) void @llvm.hexagon.Y4.l2fetch(i8, i32) void @llvm.hexagon.Y5.l2fetch(i8, i64) The clang part will follow. llvm-svn: 308032	2017-07-14 15:58:48 +00:00
Nirav Dave	a8f63af9d1	Improve Aliasing of operations to static alloca Recommiting after adding check to avoid miscomputing alias information on addresses of the same base but different subindices. Memory accesses offset from frame indices may alias, e.g., we may merge write from function arguments passed on the stack when they are contiguous. As a result, when checking aliasing, we consider the underlying frame index's offset from the stack pointer. Static allocs are realized as stack objects in SelectionDAG, but its offset is not set until post-DAG causing DAGCombiner's alias check to consider access to static allocas to frequently alias. Modify isAlias to consider access between static allocas and access from other frame objects to be considered aliasing. Many test changes are included here. Most are fixes for tests which indirectly relied on our aliasing ability and needed to be modified to preserve their original intent. The remaining tests have minor improvements due to relaxed ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll which has a minor degradation dispite though the pre-legalized DAG is improved. Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand Reviewed By: rnk Subscribers: sdardis, nemanjai, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33345 llvm-svn: 308025	2017-07-14 13:56:21 +00:00
Zoran Jovanovic	0e03935182	Reverting commit 308011. llvm-svn: 308017	2017-07-14 10:52:22 +00:00
Zoran Jovanovic	d374c5993b	[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: ADDIU instruction is transformed into 16-bit instruction ADDIUSP ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP Function InRange is changed to avoid left shifting of negative values, since that caused some sanitizer tests to fail (so the previous patch Differential Revision: https://reviews.llvm.org/D34511 llvm-svn: 308011	2017-07-14 10:13:11 +00:00
Diana Picus	87a7067983	[ARM] GlobalISel: Support G_BRCOND Insert a TSTri to set the flags and a Bcc to branch based on their values. This is a bit inefficient in the (common) cases where the condition for the branch comes from a compare right before the branch, since we set the flags both as part of the compare lowering and as part of the branch lowering. We're going to live with that until we settle on a principled way to handle this kind of situation, which occurs with other patterns as well (combines might be the way forward here). llvm-svn: 308009	2017-07-14 09:46:06 +00:00
Sam Parker	2893448576	[ARM] Allow rematerialization of ARM Thumb literal pool loads Constants are crucial for code size in the ARM Thumb-1 instruction set. The 16 bit instruction size often does not offer enough space for immediate arguments. This means that additional instructions are frequently used to load constants into registers. Since constants are hoisted, this can lead to significant register spillage if they are used multiple times in a single function. This can be avoided by rematerialization, i.e. recomputing a constant instead of reloading it from the stack. This patch fixes the rematerialization of literal pool loads in the ARM Thumb instruction set. Patch by Philip Ginsbach Differential Revision: https://reviews.llvm.org/D33936 llvm-svn: 308004	2017-07-14 08:23:56 +00:00
Matt Arsenault	23e4df6a59	AMDGPU: Detect kernarg segment pointer This is necessary to pass the kernarg segment pointer to callee functions. Also don't unconditionally enable for kernels. llvm-svn: 307978	2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin	dc2890a887	[AMDGPU] fcaninicalize optimization for GFX9+ Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that is possible to further optimize fcanonicalize and remove it if applied to min/max given their operands are known not to be an sNaN or that sNaNs are not supported. Additionally we can remove fcanonicalize if denorms are supported for the VT and we know that its argument is never a NaN. Differential Revision: https://reviews.llvm.org/D35335 llvm-svn: 307976	2017-07-13 23:59:15 +00:00
Matt Arsenault	6b93046f29	AMDGPU: Annotate call graph with used features Previously this wouldn't detect used features indirectly used in callee functions. llvm-svn: 307967	2017-07-13 21:43:42 +00:00
Andrew Zhogin	af3d5fe83b	[X86][tests] Added rotate_vec.ll CodeGen test. NFC precommit for bug 33691 fix. llvm-svn: 307937	2017-07-13 18:57:40 +00:00
Nemanja Ivanovic	3c7e276d24	[PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16 As outlined in the PR, we didn't ensure that displacements for DQ-Form instructions are multiples of 16. Since the instruction encoding encodes a quad-word displacement, a sub-16 byte displacement is meaningless and ends up being encoded incorrectly. Fixes https://bugs.llvm.org/show_bug.cgi?id=33671. Differential Revision: https://reviews.llvm.org/D35007 llvm-svn: 307934	2017-07-13 18:17:10 +00:00
Martin Storsjo	68266faa31	[AArch64] Implement support for windows style vararg functions Pass parameters properly in calls to such functions (pass all floats in integer registers), and handle va_start properly (allocate stack immediately below the arguments on the stack, to save the register arguments into a single continuous array). Differential Revision: https://reviews.llvm.org/D35006 llvm-svn: 307928	2017-07-13 17:03:12 +00:00
Matthew Simpson	06e6a6bdff	[AArch64] Add preliminary support for ARMv8.1 SUB/AND atomics This patch is a follow-up to r305893 and adds preliminary support for the fetch_sub and fetch_and operations. llvm-svn: 307913	2017-07-13 15:01:23 +00:00
Simon Dardis	250256f9c9	Reland "[mips] Fix multiprecision arithmetic." For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC, get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs. For MIPS, only the DSP ASE has a carry flag, so in the general case it is not useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes. Also improve the generation code in such cases for targets with TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the comparison node rather than using it in selects. Similarly for ISD::SUBE / ISD::SUBC. Address optimization breakage by moving the generation of MIPS specific integer multiply-accumulate nodes to before legalization. This revolves PR32713 and PR33424. Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D33494 The previous version of this patch was too aggressive in producing fused integer multiple-addition instructions. llvm-svn: 307906	2017-07-13 11:28:05 +00:00
Diana Picus	c452175642	[ARM] GlobalISel: Support G_BR This boils down to not crashing in reg bank select due to the lack of register operands on this instruction, and adding some tests. The instruction selection is already covered by the TableGen'erated code. llvm-svn: 307904	2017-07-13 11:09:34 +00:00
Simon Pilgrim	bb85cb16e3	[DAGCombiner] Fix issue with rotate combines asserting if the constant value types differ from the result type. llvm-svn: 307900	2017-07-13 10:41:49 +00:00
Dylan McKay	9fb04071a2	[AVR] Fix indirect calls to function pointers Patch by Carl Peto. llvm-svn: 307888	2017-07-13 08:09:36 +00:00
Geoff Berry	6748abe24d	[MIR] Add support for printing and parsing target MMO flags Summary: Add target hooks for printing and parsing target MMO flags. Targets may override getSerializableMachineMemOperandTargetFlags() to return a mapping from string to flag value for target MMO values that should be serialized/parsed in MIR output. Add implementation of this hook for AArch64 SuppressPair MMO flag. Reviewers: bogner, hfinkel, qcolombet, MatzeB Subscribers: mcrosier, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34962 llvm-svn: 307877	2017-07-13 02:28:54 +00:00
Matt Arsenault	ce34ac588e	AMDGPU: Fix converting unanalyzable global loads to SMRD Not all memory dependence queries succeed, so this needs to be conservative if it fails. llvm-svn: 307861	2017-07-12 23:06:18 +00:00
Sanjay Patel	ac29895173	[x86] add select-of-constant tests; NFC We're using cmov in these cases, but we could reduce to simpler ops. llvm-svn: 307859	2017-07-12 22:42:39 +00:00
Daniel Neilson	965613ef1b	Add element atomic memset intrinsic Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size. Reviewers: eli.friedman, reames, mkazantsev, skatkov Reviewed By: reames Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D34885 llvm-svn: 307854	2017-07-12 21:57:23 +00:00
Stanislav Mekhanoshin	5680b0ca9f	[AMDGPU] fcanonicalize elimination optimization We are using multiplication by 1.0 to flush denormals and quiet sNaNs. That is possible to omit this multiplication if source of the fcanonicalize instruction is known to be flushed/quieted, i.e. if it comes from another instruction known to do the normalization and we are using IEEE mode to quiet sNaNs. Differential Revision: https://reviews.llvm.org/D35218 llvm-svn: 307848	2017-07-12 21:20:28 +00:00
Sanjay Patel	4450e73b5e	[x86] improve SBB optimizations for SETB/SETA with subtract This is another step towards removing a combine that turns sext into select of constants and preparing the backend for an IR future where select is the canonical form. Earlier commits in this area: https://reviews.llvm.org/rL306040 https://reviews.llvm.org/rL306072 https://reviews.llvm.org/rL307404 (https://reviews.llvm.org/D34652) https://reviews.llvm.org/rL307471 llvm-svn: 307821	2017-07-12 17:56:46 +00:00
Sanjay Patel	6d6c06879c	[x86] add tests for improving sbb transforms; NFC We're subtracting X from X the hard way... llvm-svn: 307819	2017-07-12 17:44:50 +00:00
Justin Bogner	4fc696635d	GlobalISel: Handle selection of G_IMPLICIT_DEF in AArch64 A generic variant of IMPLICIT_DEF was added in r306875, but this survives to selection and hits a `Cannot Select`. Add handling that converts the note to a regular IMPLICIT_DEF. llvm-svn: 307817	2017-07-12 17:32:32 +00:00
Evandro Menezes	14ba3d7730	[CodeGen] Add dependency printer Add SDep printer to make debugging sessions more productive. Differential revision: https://reviews.llvm.org/D35144 llvm-svn: 307799	2017-07-12 15:30:59 +00:00
Davide Italiano	a63981aaa9	[X86/FastIsel] Fall-back to SelectionDAG when lowering soft-floats. FastIsel can't handle them, so we would end up crashing during register class selection. Fixes PR26522. Differential Revision: https://reviews.llvm.org/D35272 llvm-svn: 307797	2017-07-12 15:26:06 +00:00
Daniel Neilson	57226ef33c	Add element atomic memmove intrinsic Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size. Reviewers: eli.friedman, reames, mkazantsev, skatkov Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34884 llvm-svn: 307796	2017-07-12 15:25:26 +00:00
Simon Pilgrim	8dfbc772d7	[X86][SSE] Fix file check prefix warning breaking buildbots llvm-svn: 307790	2017-07-12 13:41:13 +00:00
Kamil Rytarowski	cce21c1dfe	Make shell redirection construct portable Summary: NetBSD shell sh(1) does not support ">& /dev/null" construct. This is bashism. The portable and POSIX solution is to use: "> /dev/null 2>&1". This change fixes 22 Unexpected Failures on NetBSD/amd64 for the "check-llvm" target. Sponsored by <The NetBSD Foundation> Reviewers: joerg, dim, rnk Reviewed By: joerg, rnk Subscribers: rnk, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D35277 llvm-svn: 307789	2017-07-12 13:24:46 +00:00
John Brawn	97cc283117	[ARM] Adjust ifcvt heuristic for the diamond ifcvt case When we have a diamond ifcvt the fallthough block will have a branch at the end of it that disappears when predicated, so discount it from the predication cost. Differential Revision: https://reviews.llvm.org/D34952 llvm-svn: 307788	2017-07-12 13:23:10 +00:00
Simon Pilgrim	ebbb969d21	[X86][SSE] Add 512-bit (iX bitcast(vXi1)) test cases Improves test coverage for pre-AVX512 targets as well llvm-svn: 307783	2017-07-12 12:44:10 +00:00
Diana Picus	21014df5e0	[ARM] GlobalISel: Select s64 G_FCMP Very similar to how we select s32 G_FCMP, the only thing that is different is the exact opcodes that we use. llvm-svn: 307763	2017-07-12 09:01:54 +00:00
Michael Zuckerman	fce5c67920	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess. Adding base test for AVX512 llvm-svn: 307761	2017-07-12 08:01:44 +00:00
Matthias Braun	053b084263	Specify complete target triple in test This should fix the problems on the greendragon build. llvm-svn: 307747	2017-07-12 01:16:50 +00:00
Konstantin Zhuravlyov	bb80d3e1d3	Enhance synchscope representation OpenCL 2.0 introduces the notion of memory scopes in atomic operations to global and local memory. These scopes restrict how synchronization is achieved, which can result in improved performance. This change extends existing notion of synchronization scopes in LLVM to support arbitrary scopes expressed as target-specific strings, in addition to the already defined scopes (single thread, system). The LLVM IR and MIR syntax for expressing synchronization scopes has changed to use syncscope("<scope>"), where <scope> can be "singlethread" (this replaces singlethread keyword), or a target-specific name. As before, if the scope is not specified, it defaults to CrossThread/System scope. Implementation details: - Mapping from synchronization scope name/string to synchronization scope id is stored in LLVM context; - CrossThread/System and SingleThread scopes are pre-defined to efficiently check for known scopes without comparing strings; - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in the bitcode. Differential Revision: https://reviews.llvm.org/D21723 llvm-svn: 307722	2017-07-11 22:23:00 +00:00
Sanjay Patel	7c026cb1af	[x86] auto-generate full checks; NFC llvm-svn: 307718	2017-07-11 22:04:36 +00:00
Michael Zuckerman	1fe5628aa0	reverting 307677. llvm-svn: 307698	2017-07-11 19:46:11 +00:00
Tony Jiang	892f8c42dc	[PPC] Fix one test case regression for patch https://reviews.llvm.org/D34337 . llvm-svn: 307691	2017-07-11 19:07:10 +00:00
Michael Zuckerman	4b6d01a008	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess. Base test for avx512 adding new base test to trunk befor commit change on the test llvm-svn: 307677	2017-07-11 17:17:49 +00:00
Krzysztof Parzyszek	f67cd8259d	[Hexagon] Do not rely on callee-saved info in hasFP llvm-svn: 307675	2017-07-11 17:11:54 +00:00
Tony Jiang	d5acad053b	[PPC] Fix two bugs in frame lowering. 1. The available program storage region of the red zone to compilers is 288 bytes rather than 244 bytes. 2. The formula for negative number alignment calculation should be y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1). Differential Revision: https://reviews.llvm.org/D34337 llvm-svn: 307672	2017-07-11 16:42:20 +00:00
Krzysztof Parzyszek	c86e2ef3f5	[Hexagon] Add support for nontemporal loads and stores on HVX Patch by Michael Wu. Differential Revision: https://reviews.llvm.org/D35104 llvm-svn: 307671	2017-07-11 16:39:33 +00:00
Diana Picus	1e33c9c166	[ARM] GlobalISel: Tighten G_FCMP selection test. NFC Use CHECK-NEXT for the comparison sequence, to make sure we don't get any unexpected instructions in the middle of our flag manipulation efforts. llvm-svn: 307656	2017-07-11 12:34:33 +00:00
Guy Blank	509d1b2a5a	[X86][AVX512] regenerate avx512-insert-extract.ll llvm-svn: 307654	2017-07-11 11:51:49 +00:00
Diana Picus	069da27f49	[ARM] GlobalISel: Add reg mapping for s64 G_FCMP Map the result into GPR and the operands into FPR. llvm-svn: 307653	2017-07-11 11:47:45 +00:00
Diana Picus	84baba20db	[ARM] GlobalISel: Tighten legalizer tests. NFC Make sure that all the legalizer tests where the original instruction needs to be removed check for the removal. We do this by adding CHECK-NOT lines before and after the replacement sequence. This won't catch pathological cases where the instruction remains somewhere in the middle of the instruction sequence that's supposed to replace it, but hopefully that won't occur in practice (since ideally we'd be setting the insert point for the new instruction sequence either before or after the original instruction and not fiddle with it while building the sequence). llvm-svn: 307647	2017-07-11 10:52:08 +00:00
Diana Picus	443135c6eb	[ARM] GlobalISel: Fix oversight in G_FCMP legalization We used to forget to erase the original instruction when replacing a G_FCMP true/false. Fix this bug and make sure the tests check for it. llvm-svn: 307639	2017-07-11 09:43:51 +00:00
Daniel Sanders	fe12c0fa56	[globalisel][tablegen] Correct matching of intrinsic ID's. TreePatternNode considers them to be plain integers but MachineInstr considers them to be a distinct kind of operand. The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for everything except GlobalISelEmitter (confirmed by diffing the tablegenerated files). GlobalISelEmitter is currently unable to infer the type of operands in the Dst pattern from the operands in the Src pattern. llvm-svn: 307634	2017-07-11 08:57:29 +00:00
Diana Picus	b57bba8316	[ARM] GlobalISel: Legalize s64 G_FCMP Same as the s32 version, for both hard and soft float. llvm-svn: 307633	2017-07-11 08:50:01 +00:00
Serguei Katkov	0e831c996c	Revert Revert [MBP] do not rotate loop if it creates extra branch This is a second attempt to land this patch. The first one resulted in a crash of clang sanitizer buildbot. The fix is here and regression test is added. This is a last fix for the corner case of PR32214. Actually this is not really corner case in general. We should not do a loop rotation if we create an additional branch due to it. Consider the case where we have a loop chain H, M, B, C , where H is header with viable fallthrough from pre-header and exit from the loop M - some middle block B - backedge to Header but with exit from the loop also. C - some cold block of the loop. Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch. Let's compute the change in number of branches: +1 branch from pre-header to header -1 branch from header to exit +1 branch from header to middle block if there is such -1 branch from cold bock to header if there is one So if C is not a predecessor of H then we introduce extra branch. This change actually prohibits rotation of the loop if both true Best Exit has next element in chain as successor. Last element in chain is not a predecessor of first element of chain. Reviewers: iteratee, xur, sammccall, chandlerc Reviewed By: iteratee Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34745 llvm-svn: 307631	2017-07-11 08:34:58 +00:00
Igor Breger	324d3791f8	[GlobalISel][X86] Use correct AND instructions. AND8ri8 not supported in 64bit. llvm-svn: 307630	2017-07-11 08:04:51 +00:00
Serguei Katkov	0b7b59ada3	[CGP] Relax a bit restriction for optimizeMemoryInst to extend scope CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing if all instructions combining the address for memory instruction is in the same block as memory instruction itself. However if any of these instruction are placed after memory instruction then address calculation will not be folded to memory instruction. The added test case shows an example. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34862 llvm-svn: 307628	2017-07-11 06:24:44 +00:00
Dylan McKay	9cf1dc1e0f	[AVR] Use the generic branch relaxer llvm-svn: 307617	2017-07-11 04:17:13 +00:00
Matthias Braun	b38736706e	Revert "[DAG] Improve Aliasing of operations to static alloca" Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some comments to https://reviews.llvm.org/D33345 about it. This reverts commit r307546. llvm-svn: 307589	2017-07-10 20:51:30 +00:00
Matt Arsenault	9cff06f37b	AMDGPU: Allow SIShrinkInstructions to fold FrameIndexes llvm-svn: 307576	2017-07-10 20:04:35 +00:00
Matt Arsenault	6c29c5acfe	AMDGPU: Allow SIShrinkInstructions to work in non-SSA Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575	2017-07-10 19:53:57 +00:00
Krzysztof Parzyszek	df4a05d6fb	[Hexagon] Fix check for HMOTF_ConstExtend operand flag This fixes https://llvm.org/PR33718. llvm-svn: 307566	2017-07-10 18:38:52 +00:00
Krzysztof Parzyszek	0ac065f318	[Hexagon] Handle Hexagon-specific machine operand target flags in MIR llvm-svn: 307564	2017-07-10 18:31:02 +00:00
Tony Jiang	acefbcf38e	[PPC CodeGen] Expand the bitreverse.i64 intrinsic. Differential Revision: https://reviews.llvm.org/D34908 Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093 llvm-svn: 307563	2017-07-10 18:11:23 +00:00
Lei Huang	168d14b143	[PowerPC] Reduce register pressure by not materializing a constant just for use as an index register for X-Form loads/stores. For this example: float test (int *arr) { return arr[2]; } We currently generate the following code: li r4, 8 lxsiwax f0, r3, r4 xscvsxdsp f1, f0 With this patch, we will now generate: addi r3, r3, 8 lxsiwax f0, 0, r3 xscvsxdsp f1, f0 Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204 Differential Revision: https://reviews.llvm.org/D35027 llvm-svn: 307553	2017-07-10 16:44:45 +00:00
Andrew V. Tischenko	ae9d6db769	[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573). The new version of the model is definitely faster. Differential Revision: https://reviews.llvm.org/D35198 llvm-svn: 307552	2017-07-10 16:36:03 +00:00
Nirav Dave	163e1ad9dc	[DAG] Improve Aliasing of operations to static alloca Memory accesses offset from frame indices may alias, e.g., we may merge write from function arguments passed on the stack when they are contiguous. As a result, when checking aliasing, we consider the underlying frame index's offset from the stack pointer. Static allocs are realized as stack objects in SelectionDAG, but its offset is not set until post-DAG causing DAGCombiner's alias check to consider access to static allocas to frequently alias. Modify isAlias to consider access between static allocas and access from other frame objects to be considered aliasing. Many test changes are included here. Most are fixes for tests which indirectly relied on our aliasing ability and needed to be modified to preserve their original intent. The remaining tests have minor improvements due to relaxed ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll which has a minor degradation dispite though the pre-legalized DAG is improved. Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand Reviewed By: rnk Subscribers: sdardis, nemanjai, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33345 llvm-svn: 307546	2017-07-10 15:39:41 +00:00
Gadi Haber	f4d154c089	This patch completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target. The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information. Please note that the patch extensively affects the X86 MC instr scheduling for SNB. Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX. The updated and extended information about each instruction includes the following details: •static latency of the instruction •number of uOps from which the instruction consists of •all ports used by the instruction's' uOPs For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5: def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> { let Latency = 9; let NumMicroOps = 6; let ResourceCycles = [1,2,2,1]; } def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>; Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script. Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb Differential Revision: https://reviews.llvm.org/D35019#inline-304691 llvm-svn: 307529	2017-07-10 09:53:16 +00:00
Igor Breger	d8b51e134e	[GlobalISel][X86] Support G_LOAD/G_STORE i1. Summary: Support G_LOAD/G_STORE i1. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35178 llvm-svn: 307527	2017-07-10 09:26:09 +00:00
Igor Breger	d48c5e4855	[GlobalISel][X86] extend G_ZEXT support. Summary: Mark G_ZEXT/G_SEXT i1 to i8/i16, i8 to i16 as legal. Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code). This patch requred to support G_LOAD/G_STORE i1. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D35177 llvm-svn: 307526	2017-07-10 09:07:34 +00:00
Davide Italiano	c4b0ccd049	[X86] Relax an assertion when legalizing vector types. WidenVSELECTAndMask can fold (and it folds in this case) so we get a BUILD_VECTOR of constants as mask. convertMask() seems to work fine when the input is a vector of constants, and we still need to call it to extend/add elements at the end. but the current code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC. This change was discussed briefly with Simon Pilgrim, who also suggests we might consider dropping this assertion in the future. Fixes PR33715. llvm-svn: 307508	2017-07-09 19:22:48 +00:00
Dylan McKay	448c56e2a5	[AVR] Fix test errors due to tied operands not matching Broken due to r307259. llvm-svn: 307503	2017-07-09 16:36:35 +00:00
Simon Pilgrim	55a4b6700f	Handle ConstantExpr correctly in SelectionDAGBuilder This change fixes a bug in SelectionDAGBuilder::visitInsertValue and SelectionDAGBuilder::visitExtractValue where constant expressions (InsertValueConstantExpr and ExtractValueConstantExpr) would be treated as non-constant instructions (InsertValueInst and ExtractValueInst). This bug resulted in an incorrect memory access, which manifested as an assertion failure in SDValue::SDValue. Fixes PR#33094. Submitted on behalf of @Praetonus (Benoit Vey) Differential Revision: https://reviews.llvm.org/D34538 llvm-svn: 307502	2017-07-09 16:01:04 +00:00
Simon Pilgrim	8247687e0f	[X86][AVX512] Regenerate AVX512VL comparison tests. Show poor codegen on KNL targets as mentioned on D35179 llvm-svn: 307500	2017-07-09 15:47:43 +00:00
Igor Breger	769cd05232	[GlobalISel][X86] Add legalizer tests for G_LOAD/G_STORE operations. NFC. llvm-svn: 307494	2017-07-09 07:25:57 +00:00
Igor Breger	b80b44b7b9	[FastISel] fix a fallback diagnostic. Summary: FastISel was marked as failed in case instruction selection succeeded. Reviewers: qcolombet, zvi, rovka, ab Reviewed By: zvi Subscribers: javed.absar, ab, qcolombet, bogner, llvm-commits Differential Revision: https://reviews.llvm.org/D34438 llvm-svn: 307489	2017-07-09 05:55:20 +00:00
Hiroshi Inoue	713b5ba2de	fix trivial typos; NFC sucessor -> successor llvm-svn: 307488	2017-07-09 05:54:44 +00:00
Sanjay Patel	18ee908ca2	[x86] add SBB optimization for SETBE (ule) condition code x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 rL307404 (D34652) As acknowledged in the earlier review, there's a possibility that some Intel uarch would prefer to produce an xor to clear the fake register operand with sbb %eax, %eax. This will likely need to be addressed in a separate pass. llvm-svn: 307471	2017-07-08 14:04:48 +00:00
Quentin Colombet	868ef847a6	[RegAllocFast] Don't insert kill flags of super-register for partial kill When reusing a register for a new definition, the fast register allocator used to insert a kill flag at the previous last use of that register to inform later passes that this register is free between the redef and the last use. However, this may be wrong when subregisters are involved. Indeed, a partially redef would have trigger a kill of the full super register, potentially wrongly marking all the other subregisters as free. Given we don't track which lanes are still live, we cannot set the kill flag in such case. Note: This bug has been latent for about 7 years (r104056). llvmg.org/PR33677 llvm-svn: 307428	2017-07-07 19:25:45 +00:00
Quentin Colombet	81551148b7	[RegAllocFast] Add the proper initialize method to use the .mir infrastructure NFC llvm-svn: 307427	2017-07-07 19:25:42 +00:00
Tony Jiang	c260e0eb56	[PPC CodeGen] Expand the bitreverse.i32 intrinsic. Differential Revision: https://reviews.llvm.org/D33572 Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093 llvm-svn: 307413	2017-07-07 16:41:55 +00:00
Sanjay Patel	dd36f75733	[x86] add SBB optimization for SETAE (uge) condition code x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. DAGCombiner already has the foundation to allow the transforms, so we just need to fill in the holes for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 Differential Revision: https://reviews.llvm.org/D34652 llvm-svn: 307404	2017-07-07 14:56:20 +00:00
Andrew V. Tischenko	a2ab3ed0df	NFC: I simply added CHECK-LABEL to prevent false matches in the tests. llvm-svn: 307397	2017-07-07 13:41:33 +00:00
Florian Hahn	d4550baf3b	[AArch64] Use 16 bytes as preferred function alignment on Cortex-A57. Summary: This change gives a 0.89% speed on execution time, a 0.94% improvement in benchmark scores and a 0.62% increase in binary size on a Cortex-A57. These numbers are the geomean results on a wide range of benchmarks from the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites. The software optimization guide for the Cortex-A57 recommends 16 byte branch alignment. Reviewers: t.p.northover, mcrosier, javed.absar, kristof.beyls, sbaranga Reviewed By: kristof.beyls Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D34954 llvm-svn: 307389	2017-07-07 10:43:01 +00:00
Florian Hahn	e3666ec9d6	[AArch64] Use 16 bytes as preferred function alignment on Cortex-A72. Summary: This change gives a 0.34% speed on execution time, a 0.61% improvement in benchmark scores and a 0.57% increase in binary size on a Cortex-A72. These numbers are the geomean results on a wide range of benchmarks from the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites. The software optimization guide for the Cortex-A72 recommends 16 byte branch alignment. Reviewers: t.p.northover, kristof.beyls, rengolin, sbaranga, mcrosier, javed.absar Reviewed By: kristof.beyls Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D34961 llvm-svn: 307380	2017-07-07 10:15:49 +00:00
Florian Hahn	9872a6aaad	[AArch64] Add test case for preferred function alignment (NFC). Reviewers: evandro, joelkevinjones, mcrosier Reviewed By: joelkevinjones, mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin, evandro, javed.absar, joelkevinjones, kristof.beyls Differential Revision: https://reviews.llvm.org/D34951 llvm-svn: 307369	2017-07-07 09:17:53 +00:00
Diana Picus	5b91653840	[ARM] GlobalISel: Select hard G_FCMP for s32 We lower to a sequence consisting of: - MOVi 0 into a register - VCMPS to do the actual comparison and set the VFP flags - FMSTAT to move the flags out of the VFP unit - MOVCCi to either use the "zero register" that we have previously set with the MOVi, or move 1 into the result register, based on the values of the flags As was the case with soft-float, for some predicates (one, ueq) we actually need two comparisons instead of just one. When that happens, we generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of using the result of the first MOVCCi as the "zero register" for the second one. This is a bit overkill, since one comparison followed by two non-flag-setting conditional moves should be enough. In any case, the backend manages to CSE one of the comparisons away so it doesn't matter much. Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not VCMPES. This makes the code a lot simpler, and it also seems correct since the LLVM Lang Ref defines simple true/false returns if the operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand exception, so they won't be slipping through unnoticed. Implementation-wise, this introduces a template so we can share the same code that we use for handling integer comparisons, since the only differences are in the details (exact opcodes to be used etc). Hopefully this will be easy to extend to s64 G_FCMP. llvm-svn: 307365	2017-07-07 08:39:04 +00:00
Matthias Braun	eeb1516884	RegisterScavenging: Fix PR33687 When scavenging for a use in instruction MI, we will reload after that instruction and hence cannot spill uses/defs of this instruction. This fixes http://llvm.org/PR33687 llvm-svn: 307352	2017-07-07 03:02:18 +00:00
Sean Fertile	9cd1cdf814	Extend memcpy expansion in Transform/Utils to handle wider operand types. Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346	2017-07-07 02:00:06 +00:00
Michael Kuperstein	20d8e4ef76	Reverting r307326 because it breaks clang tests. llvm-svn: 307334	2017-07-06 23:24:39 +00:00
Wei Mi	20526b2725	[ConstHoisting] choose to hoist when frequency is the same. The patch is to adjust the strategy of frequency based consthoisting: Previously when the candidate block has the same frequency with the existing blocks containing a const, it will not hoist the const to the candidate block. For that case, now we change the strategy to hoist the const if only existing blocks have more than one block member. This is helpful for reducing code size. Differential Revision: https://reviews.llvm.org/D35084 llvm-svn: 307328	2017-07-06 22:32:27 +00:00
Michael Kuperstein	b9fc48da83	[NVPTX] Add lowering of i128 params. The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 307326	2017-07-06 22:18:54 +00:00
Matt Arsenault	9aa45f047f	AMDGPU: Add macro fusion schedule DAG mutation Try to increase opportunities to shrink vcc uses. llvm-svn: 307313	2017-07-06 20:57:05 +00:00
Matt Arsenault	60b91e0ba2	AMDGPU: Remove unnecessary IR from MIR tests llvm-svn: 307311	2017-07-06 20:56:57 +00:00
Stanislav Mekhanoshin	9d7b1c9ddb	[AMDGPU] Always use rcp + mul with fast math Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308	2017-07-06 20:34:21 +00:00
Simon Pilgrim	a80cb1d7a7	[X86][SSE] Tests for bitcasting iX integers to vXi1 boolean vectors Including sign/zero extension to legal types llvm-svn: 307301	2017-07-06 19:33:10 +00:00
Simon Pilgrim	0fee3372c9	[X86][SSE] Dropped -mcpu from bitcast+setcc tests Use triple and attribute only for consistency Added SSE2/AVX tests on 256-bit vectors to test PACKSS behaviour llvm-svn: 307289	2017-07-06 18:27:34 +00:00
Wei Mi	90707394e3	[LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale. When the formulae search space is huge, LSR uses a series of heuristic to keep pruning the search space until the number of possible solutions are within certain limit. The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs, which picks the register which is used by the most LSRUses and deletes the other formulae which don't use the register. This is a effective way to prune the search space, but quite often not a good way to keep the best solution. We saw cases before that the heuristic pruned the best formula candidate out of search space. To relieve the problem, we introduce a new heuristic called NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to reduce the search space while keeping the best formula, we want to keep as many formulae with different Scale and ScaledReg as possible. That is because the central idea of LSR is to choose a group of loop induction variables and use those induction variables to represent LSRUses. An induction variable candidate is often represented by the Scale and ScaledReg in a formula. If we have more formulae with different ScaledReg and Scale to choose, we have better opportunity to find the best solution. That is why we believe pruning search space by only keeping the best formula with the same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use two criteria to choose the best formula with the same Scale and ScaledReg. The first criteria is to select the formula using less non shared registers, and the second criteria is to select the formula with less cost got from RateFormula. The patch implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the last resort. Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help on the two improved internal cases we saw. Differential Revision: https://reviews.llvm.org/D34583 llvm-svn: 307269	2017-07-06 15:52:14 +00:00
Simon Pilgrim	713600747e	[X86][SSE4A] Add support for shuffle combining to INSERTQI. llvm-svn: 307268	2017-07-06 15:34:17 +00:00
Simon Pilgrim	03641df383	[X86][SSE4A] Add test showing missed opportunities to combine INSERTQI shuffle llvm-svn: 307265	2017-07-06 14:52:24 +00:00
Sanjay Patel	2a341620e7	[x86] fix over-specified triple and auto-generate checks; NFC llvm-svn: 307262	2017-07-06 14:15:15 +00:00
Mikael Holmen	9c3e2eac6a	[MachineVerifier] Add check that tied physregs aren't different. Summary: Added MachineVerifier code to check register ties more thoroughly, especially so that physical registers that are tied are the same. This may help e.g. when creating MIR files. Original patch by Jesper Antonsson Reviewers: stoklund, sanjoy, qcolombet Reviewed By: qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D34394 llvm-svn: 307259	2017-07-06 13:18:21 +00:00
Simon Pilgrim	cc0f785dca	[X86][SSE4A] Add support for shuffle combining to EXTRQ. llvm-svn: 307254	2017-07-06 12:22:58 +00:00
Simon Pilgrim	40c0ae200f	[X86][SSE4A] Add scheduling tests for SSE4A instructions llvm-svn: 307251	2017-07-06 11:26:43 +00:00
David Stuttard	7528d4bd42	[RegisterCoalescer] Fix for SubRange join unreachable Summary: During remat, some subranges might end up having invalid segments which caused problems for later coalescing. Added in a check to remove segments that are invalidated as part of the remat. See http://llvm.org/PR33524 Subscribers: MatzeB, qcolombet Differential Revision: https://reviews.llvm.org/D34391 llvm-svn: 307247	2017-07-06 10:07:57 +00:00
Diana Picus	c3a9c34761	[ARM] GlobalISel: Map s32 G_FCMP in reg bank select Map hard G_FCMP operands to FPR and the result to GPR. llvm-svn: 307245	2017-07-06 09:57:46 +00:00
Diana Picus	d0104eaae8	[ARM] GlobalISel: Legalize G_FCMP for s32 This covers both hard and soft float. Hard float is easy, since it's just Legal. Soft float is more involved, because there are several different ways to handle it based on the predicate: one and ueq need not only one, but two libcalls to get a result. Furthermore, we have large differences between the values returned by the AEABI and GNU functions. AEABI functions return a nice 1 or 0 representing true and respectively false. GNU functions generally return a value that needs to be compared against 0 (e.g. for ogt, the value returned by the libcall is > 0 for true). We could introduce redundant comparisons for AEABI as well, but they don't seem easy to remove afterwards, so we do different processing based on whether or not the result really needs to be compared against something (and just truncate if it doesn't). llvm-svn: 307243	2017-07-06 09:09:33 +00:00
Diana Picus	cd460c89c4	[ARM] GlobalISel: Widen s1, s8, s16 G_CONSTANT Get the legalizer to widen small constants. llvm-svn: 307239	2017-07-06 08:04:16 +00:00
Vadim Chugunov	e6f76558c7	Fix libcall expansion creating DAG nodes with invalid type post type legalization. If we are lowering a libcall after legalization, we'll split the return type into a pair of legal values. Patch by Jatin Bhateja and Eli Friedman. Differential Revision: https://reviews.llvm.org/D34240 llvm-svn: 307207	2017-07-05 22:01:49 +00:00
Simon Pilgrim	ac78daf517	{DAGCombiner] Fold (rot x, 0) -> x llvm-svn: 307184	2017-07-05 18:27:11 +00:00
Simon Pilgrim	49123d4bb0	[X86] Test bitfield loadstore tests on i686 as well llvm-svn: 307182	2017-07-05 18:09:30 +00:00
Sean Fertile	d44cb1838f	[PowerPC] Make sure that we remove dead PHI nodes after the PPCCTRLoops pass. Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D34829 llvm-svn: 307180	2017-07-05 17:57:57 +00:00
Andrew Zhogin	45d192823e	[DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions into one with combined shift operand. For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize. Differential revision: https://reviews.llvm.org/D12833 llvm-svn: 307179	2017-07-05 17:55:42 +00:00
Simon Pilgrim	55006b407b	[X86][SSE] Dropped -mcpu from bitcast+setcc mask tests Use triple and attribute only for consistency llvm-svn: 307176	2017-07-05 17:30:30 +00:00
Tony Jiang	aa5a6a1c30	[Power9] Exploit vector extract with variable index. This patch adds the exploitation for new power 9 instructions which extract variable elements from vectors: VEXTUBLX VEXTUBRX VEXTUHLX VEXTUHRX VEXTUWLX VEXTUWRX Differential Revision: https://reviews.llvm.org/D34032 Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) llvm-svn: 307174	2017-07-05 16:55:00 +00:00
Tony Jiang	9a91a18110	[Power9] Exploit vector integer extend instructions when indices aren't correct. This patch adds on to the exploitation added by https://reviews.llvm.org/D33510. This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine. Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) Differential Revision: https://reviews.llvm.org/D34009 llvm-svn: 307169	2017-07-05 16:00:38 +00:00
Nirav Dave	65b7ab1be4	[Hexagon] Preclude non-memory test from being optimized away. NFC. llvm-svn: 307153	2017-07-05 13:08:03 +00:00
Igor Breger	55e2f5963a	[GlobalIsel] allow x86_fp80 values to be dumped. Summary: Otherwise the fallback path fails with an assertion on x86_64 targets, when "x86_fp80" is encountered. Reviewers: t.p.northover, zvi, guyblank Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34975 llvm-svn: 307140	2017-07-05 11:11:10 +00:00
Nemanja Ivanovic	5fd4ea36fd	Add the missing triple to the test case added as part of r307120. llvm-svn: 307122	2017-07-05 05:14:43 +00:00
Nemanja Ivanovic	845a7968bc	[PowerPC] Fix for PR33636 Remove casts to a constant when a node can be an undef. Differential Revision: https://reviews.llvm.org/D34808 llvm-svn: 307120	2017-07-05 04:51:29 +00:00
Nirav Dave	b320ef9fab	Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset Relanding after rewriting undef.ll test to avoid host-dependant endianness. As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using generic checks. Also, propagate missing local handling from there to BaseIndexOffset checks. Tests of note: * test/CodeGen/X86/build-vector* - Improved. * test/CodeGen/BPF/undef.ll - Improved store alignment allows an additional store merge * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a case we already do not handle well. Here, the DAG is improved, but scheduling causes a code size degradation. Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D34472 llvm-svn: 307114	2017-07-05 01:21:23 +00:00
Dylan McKay	a24aa19900	Revert "[AVR] Add the branch selection pass from the GitHub repository" This reverts commit 602ef067c1d58ecb425d061f35f2bc4c7e92f4f3. llvm-svn: 307111	2017-07-05 00:50:56 +00:00
Dylan McKay	f115c7f917	[AVR] Add the branch selection pass from the GitHub repository We should rewrite this using the generic branch relaxation pass, but for the moment having this pass is better than hitting an assertion error. llvm-svn: 307109	2017-07-05 00:41:19 +00:00
Gadi Haber	689426e3cb	NFC. Made some updates to the half.ll test under CodeGen to make it friendly to the update_llc_test_checks .py tool as follows: 1.Removing the llc flag -asm-verbose=false 2.Grouping the multiple check-prefix directives 3.Apply update_llc_test_checks.py tool on the test This change is needed to easily update scheduling changes in an upcoming patch. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D34934 llvm-svn: 307108	2017-07-04 21:51:05 +00:00
Andrew Zhogin	2f8be0552e	[ARM][test] Added test/CodeGen/ARM/ror.ll test. NFC precommit for D12833. llvm-svn: 307103	2017-07-04 19:50:22 +00:00
Simon Pilgrim	ac3e7f3f57	[X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shuffles With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types llvm-svn: 307099	2017-07-04 18:11:02 +00:00
Alexander Timofeev	982aee6a38	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097	2017-07-04 17:32:00 +00:00
Anna Thomas	505941e7d6	[FastISel] Move gc intrinsic test to X86 directory Move from generic to X86 directory since gc intrinsics only supposed in X86 64 bit. Add target triple as well. Fixes build failure in i686-linux-RA caused by rL307084. llvm-svn: 307086	2017-07-04 15:24:08 +00:00
Anna Thomas	a66a98cc74	[FastISel][SelectionDAG]Teach fastISel about GC intrinsics Summary: We are crashing in LLC at O0 when gc intrinsics are present in the block. The reason being FastISel performs basic block ISel by modifying GC.relocates to be the first instruction in the block. This can cause us to visit the GC relocate before it's corresponding GC.statepoint is visited, which is incorrect. When we lower the statepoint, we record the base and derived pointers, along with the gc.relocates. After this we can visit the gc.relocate. This patch avoids fastISel from incorrectly creating the block with gc.relocate as the first instruction. Reviewers: qcolombet, skatkov, qikon, reames Reviewed by: skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34421 llvm-svn: 307084	2017-07-04 15:09:09 +00:00
Simon Pilgrim	d128222f0c	[X86] Add combine tests for vector rotates Reference tests for D12833 llvm-svn: 307073	2017-07-04 12:33:53 +00:00
Gadi Haber	4980790e81	NFC commit. Converting the Codegen test "extractelement-legalization-store-ordering.ll" to be "update_llc_test_checks" friendly. The changes to the test are needed for an upcoming scheduling patch. Reviewers: zvi, RKSimon Differential Revision: https://reviews.llvm.org/D34935 llvm-svn: 307066	2017-07-04 07:18:03 +00:00
Craig Topper	ad140cfb68	[X86] Add comment string for broadcast loads from the constant pool. Summary: When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool. I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead. Reviewers: spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34923 llvm-svn: 307062	2017-07-04 05:46:11 +00:00
Dylan McKay	b224d98594	[AVR] Fix bug which caused assertion errors for some FRMIDX instructions Previously, if a basic block ended with a FRMIDX instruction, we would end up doing something like this. *std::next(MBB.end()) Which would hit an error: "Assertion `!NodePtr->isKnownSentinel()' failed." llvm-svn: 307057	2017-07-04 04:40:06 +00:00
NAKAMURA Takumi	e4a741376b	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default" It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054	2017-07-04 02:14:18 +00:00
Anton Yartsev	66d32c5e06	[legalize-types] Clean up softening machinery. The patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() now fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. This prevents softening mashinery from leaving stale entries in SoftenFloats map (that resulted in errors during the legalize type checking) and clarifies softening. The patch replaces https://reviews.llvm.org/D29265. Differential Revision: https://reviews.llvm.org/D31946 llvm-svn: 307053	2017-07-04 01:08:55 +00:00
Simon Pilgrim	fa6e675267	[X86][SSE4A] Add support for combining from EXTRQI/INSERTQI shuffles llvm-svn: 307048	2017-07-03 20:58:16 +00:00
Simon Pilgrim	bdfb3b1d5f	[X86][SSE4A] Add SSE4A shuffle tests on pre-SSSE3 hardware llvm-svn: 307042	2017-07-03 16:53:11 +00:00
Simon Pilgrim	b5c68a6717	[X86][SSE4A] Test SSE4A shuffle combining on SSE42 capable target as well llvm-svn: 307038	2017-07-03 15:55:54 +00:00
Zvi Rackover	d7a1c334ce	DAGCombine: Combine BUILD_VECTOR to TRUNCATE Summary: Add a combine for creating a truncate to replace a build_vector composed of extracts with indices that form a stride-2^N series. Example: v8i32 V = ... v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6)) --> v4i32 truncate (bitcast V to v4i64) Related discussion in llvm-dev about canonicalizing shuffles to truncates in LLVM IR: http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html. Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena Reviewed By: delena Subscribers: guyblank, delena, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34077 llvm-svn: 307036	2017-07-03 15:47:40 +00:00
Sanjay Patel	e9b1d16a8c	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. There were also over-specifications in the RUN params such as CPU model. llvm-svn: 307033	2017-07-03 15:27:19 +00:00
Sanjay Patel	d3173740fd	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. There were also several over-specifications in the RUN params such as CPU model or OS requirement llvm-svn: 307028	2017-07-03 15:04:05 +00:00
Simon Pilgrim	decfaca033	[X86][SSE4A] Add tests showing missed opportunities to combine EXTRQI/INSERTQI shuffles llvm-svn: 307027	2017-07-03 15:01:07 +00:00
Alexander Timofeev	ea7f08bee5	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026	2017-07-03 14:54:11 +00:00
Sanjay Patel	dab798a25f	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 307024	2017-07-03 14:29:45 +00:00
Igor Breger	5c787ab346	[GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection. llvm-svn: 307019	2017-07-03 11:06:54 +00:00
Matt Arsenault	3f031e75aa	AMDGPU: Add operand target flags serialization llvm-svn: 306995	2017-07-02 23:21:48 +00:00
Simon Pilgrim	f05c5ef441	[X86][AVX512] Test AVX512VPOPCNTDQ CTPOP with/without AVX512BW llvm-svn: 306991	2017-07-02 19:52:20 +00:00
Simon Pilgrim	a9655ffb42	[X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOP Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back. llvm-svn: 306990	2017-07-02 19:32:37 +00:00
Simon Pilgrim	3f5ed96f92	[X86][AVX512] Cleanup tzcnt tests triples and attributes Avoid use of specific -mcpu llvm-svn: 306989	2017-07-02 18:51:48 +00:00
Simon Pilgrim	df55dd09d6	[X86][AVX512] Cleanup popcnt tests triples and attributes Avoid use of specific -mcpu llvm-svn: 306988	2017-07-02 18:35:22 +00:00
Sanjay Patel	7d263c1a27	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 306984	2017-07-02 15:24:08 +00:00
Sanjay Patel	dd076f0178	[x86] remove unnecessary RUN for test after auto-generating checks; NFC llvm-svn: 306983	2017-07-02 15:16:17 +00:00
Sanjay Patel	c22223e6cd	[x86] update test to use FileCheck and auto-generate checks; NFC llvm-svn: 306982	2017-07-02 15:15:18 +00:00
Sanjay Patel	27cccc96c2	[x86] auto-generate complete checks for tests; NFC These all used 'CHECK-NOT' which isn't necessary if we have complete checks. llvm-svn: 306981	2017-07-02 14:50:35 +00:00
Simon Pilgrim	8971b2904e	[X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306978	2017-07-02 14:16:25 +00:00
Simon Pilgrim	4cb5613c38	[X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive The 32-bit shuffles are a bit tricky and will be dealt with in a later patch llvm-svn: 306977	2017-07-02 13:19:10 +00:00
Simon Pilgrim	638af5f1c4	[X86][SSE] Add test showing missed opportunity to combine to pshuflw We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306976	2017-07-02 12:56:10 +00:00
Gadi Haber	dc25c2b08b	[X86] Rerun "update_llc_test_checks" tool on CodeGen tests. NFC. This is NFC after rerunning the "update_llc_test_checks.py" tool on the CodeGen X86 tests in order to submit a patch. Minor differences due to added "End of Function" lines. Reviewers: zvi Differential Revision: https://reviews.llvm.org/D34933 llvm-svn: 306973	2017-07-02 12:01:33 +00:00
Igor Breger	717bd36c83	[GlobalISel][X86] Support G_GLOBAL_VALUE operation. Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34738 Conflicts: test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir llvm-svn: 306972	2017-07-02 08:58:29 +00:00
Igor Breger	b186a69aa5	[GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection. Summary: Support vector type G_UNMERGE_VALUES selection. For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Reviewers: t.p.northover, qcolombet, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33665 llvm-svn: 306971	2017-07-02 08:15:49 +00:00
Hiroshi Inoue	bb703e8960	fix trivial typos; NFC suport -> support llvm-svn: 306968	2017-07-02 03:24:54 +00:00
Simon Pilgrim	3bad6f3167	[X86][RDSEED] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well. llvm-svn: 306961	2017-07-01 16:42:16 +00:00
Simon Pilgrim	2d320161e5	[X86][RDRAND] Split off i64 intrinsic tests and test i16/i32 on 32-bit target as well. llvm-svn: 306960	2017-07-01 16:41:12 +00:00
Simon Pilgrim	2b679e1812	[X86] Removed reference to update_test_checks.py llvm-svn: 306959	2017-07-01 16:34:29 +00:00
Simon Pilgrim	ad7f0844ea	[X86][AVX] Remove duplicate autogeneration note llvm-svn: 306958	2017-07-01 16:32:02 +00:00
Eric Christopher	3df231a1f7	Remove the default ARMSubtarget from the ARM TargetMachine. This enables us to ensure better LTO and code generation in the face of module linking. Remove a report_fatal_error from the TargetMachine and replace it with an assert in ARMSubtarget - and remove the test that depended on the error. The assertion will still fire in the case that we were reporting before, but error reporting needs to be in front end tools if possible for options parsing. llvm-svn: 306939	2017-07-01 03:41:53 +00:00
Teresa Johnson	32d95742b8	Recommit "r306541 - Add zero-length check to memcpy/memset load store loop expansion"" With fix for use-after-free errors. We can't add the new branch and remove the old one until we are done with the Builder constructed for the block. llvm-svn: 306937	2017-07-01 03:24:10 +00:00
Eric Christopher	015dc2094e	Rewrite ARM execute only support to avoid the use of a command line flag and unqualified ARMSubtarget lookup. Paired with a clang commit to use the new behavior. llvm-svn: 306927	2017-07-01 02:55:22 +00:00
Brian Gesiak	4ef3daafef	[ORE] Add diagnostics hotness threshold Summary: Add an option to prevent diagnostics that do not meet a minimum hotness threshold from being output. When generating optimization remarks for large codebases with a ton of cold code paths, this option can be used to limit the optimization remark output at a reasonable size. Discussion of this change can be read here: http://lists.llvm.org/pipermail/llvm-dev/2017-June/114377.html Reviewers: anemet, davidxl, hfinkel Reviewed By: anemet Subscribers: qcolombet, javed.absar, fhahn, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D34867 llvm-svn: 306912	2017-06-30 23:14:53 +00:00
Krzysztof Parzyszek	9eb75c4520	[Hexagon] Implement frame pointer elimination with -fomit-frame-pointer It applies to leaf functions that are otherwise not required to have a frame pointer. llvm-svn: 306888	2017-06-30 21:21:40 +00:00
Richard Smith	d0c0c13447	Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR This is a short-term fix for PR33650 aimed to get the modules build bots green again. Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR macros to try to locally specialize a global template for a global type. That's not how C++ works. Instead, we now centrally define how to format vectors of fundamental types and of string (std::string and StringRef). We use flow formatting for the former cases, since that's the obvious right thing to do; in the latter case, it's less clear what the right choice is, but flow formatting is really bad for some cases (due to very long strings), so we pick block formatting. (Many of the cases that were using flow formatting for strings are improved by this change.) Other than the flow -> block formatting change for some vectors of strings, this should result in no functionality change. Differential Revision: https://reviews.llvm.org/D34907 Corresponding updates to clang, clang-tools-extra, and lld to follow. llvm-svn: 306878	2017-06-30 20:56:57 +00:00
Tim Northover	ff5e7e1295	GlobalISel: add G_IMPLICIT_DEF instruction. It looks like there are two target-independent but not GISel instructions that need legalization, IMPLICIT_DEF and PHI. These are already anomalies since their operands have important LLTs attached, so to make things more uniform it seems like a good idea to add generic variants. Starting with G_IMPLICIT_DEF. llvm-svn: 306875	2017-06-30 20:27:36 +00:00
Sumanth Gundapaneni	8c5d59557d	[Hexagon] Emit jump tables in text section based on a flag This patch adds a new LLVM flag -hexagon-emit-jt-text which is defaulted to "false". The value "true" emits the switch generated jump tables in text section. Differential Revision: https://reviews.llvm.org/D34820 llvm-svn: 306872	2017-06-30 20:21:48 +00:00
Sumanth Gundapaneni	19b74203b1	Revert "[Hexagon] Guard the generation of lookup table" This reverts commit ae521f4192c3ed0202c047fec993cb59133dd1a0. Wrong commit message llvm-svn: 306871	2017-06-30 20:20:00 +00:00
Sumanth Gundapaneni	cf73758dc8	[Hexagon] Guard the generation of lookup table The llvm flag "-hexagon-emit-lookup-tables" guards the generation of lookup table from a switch statement. Differential Revision: https://reviews.llvm.org/D34819 llvm-svn: 306869	2017-06-30 20:10:28 +00:00
Tim Northover	2b5f03aa12	ARM: fix big-endian 64-bit cmpxchg. On big-endian machines the high and low parts of the value accessed by ldrexd and strexd are swapped around. To account for this we swap inputs and outputs in ISelLowering. Patch by Bharathi Seshadri. llvm-svn: 306865	2017-06-30 19:51:02 +00:00
Sanjay Patel	1be7ea4ad5	[PowerPC] auto-generate check lines; NFC The existing check lines were more flexible, but these are small enough tests that there shouldn't be much question about register allocation. I've been hand-modifying this file as I change the CGP memcmp expansion, but that's more error-prone and time-consuming than just running the update script. llvm-svn: 306861	2017-06-30 19:20:54 +00:00
Nirav Dave	a35938d827	Revert "[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset" This reverts commit r306819 which appears be exposing underlying issues in a stage1 ppc64be build llvm-svn: 306820	2017-06-30 12:56:02 +00:00
Nirav Dave	c5a48c1ee8	[DAG] Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffset As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using generic checks. Also, propagate missing local handling from there to BaseIndexOffset checks. Tests of note: * test/CodeGen/X86/build-vector* - Improved. * test/CodeGen/BPF/undef.ll - Improved store alignment allows an additional store merge * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a case we already do not handle well. Here, the DAG is improved, but scheduling causes a code size degradation. Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D34472 llvm-svn: 306819	2017-06-30 12:23:41 +00:00
Simon Pilgrim	e5e9232260	[X86] Updated 32-bit memcmp tests to run with/without SSE2 llvm-svn: 306816	2017-06-30 11:23:59 +00:00
Daniel Jasper	3b704ceba1	Revert "r306541 - Add zero-length check to memcpy/memset load store loop expansion" Segfaults in non-optimized builds. I'll get a stack trace and a reproducer to Teresa. llvm-svn: 306793	2017-06-30 06:37:33 +00:00
Heejin Ahn	ac62b05d05	[WebAssembly] Add support for exception handling instructions Summary: This adds backend support for throw, rethrow, try, and try_end instructions. This needs the corresponding clang builtin support: https://reviews.llvm.org/D34783 This follows the Wasm exception handling proposal in https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md Reviewers: sunfish, dschuff Reviewed By: dschuff Subscribers: jfb, sbc100, jgravelle-google Differential Revision: https://reviews.llvm.org/D34826 llvm-svn: 306774	2017-06-30 00:43:15 +00:00
Eric Christopher	ee837a59f7	Unified logic for computing target ABI in backend and front end by moving this common code to Support/TargetParser. Modeled Triple::GNU after front end code (aapcs abi) and updated tests that expect apcs abi. Based heavily on a patch by Ana Pazos! llvm-svn: 306768	2017-06-30 00:03:54 +00:00
Aditya Nandakumar	20f6207013	[GISel]: New Opcode G_FLOG/G_FLOG2 https://reviews.llvm.org/D34837 llvm-svn: 306766	2017-06-29 23:43:44 +00:00
Taewook Oh	0e35ea3b7c	Remove redundant copy in recurrences Summary: If there is a chain of instructions formulating a recurrence, commuting operands can help removing a redundant copy. In the following example code, ``` BB#1: ; Loop Header %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: ; Loop Latch %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def,tied1> = ADD32rr %vreg1<kill,tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1,%vreg0 %vreg3<def,tied1> = ADD32rr %vreg2<kill,tied0>, %vreg10<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2,%vreg10 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> ``` Existing two-address generation pass generates following code: ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg1<kill>; GR32:%vreg10,%vreg1 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg0<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` This is suboptimal because the assembly code generated has a redundant copy at the end of #BB6 to feed %vreg13 to BB#1: ``` .LBB0_6: addl %esi, %edi addl %ebx, %edi cmpl $10, %edi movl %edi, %esi jl .LBB0_1 ``` This redundant copy can be elimiated by making instructions in the recurrence chain to compute the value "into" the register that actually holds the feedback value. In this example, this can be achieved by commuting %vreg0 and %vreg1 to compute %vreg10. With that change, code after two-address generation becomes ``` BB#1: %vreg0<def> = COPY %vreg13<kill>; GR32:%vreg0,%vreg13 ... BB#6: derived from LLVM BB %bb7 Predecessors according to CFG: BB#5 BB#4 %vreg2<def> = COPY %vreg15<kill>; GR32:%vreg2,%vreg15 %vreg10<def> = COPY %vreg0<kill>; GR32:%vreg10,%vreg0 %vreg10<def,tied1> = ADD32rr %vreg10<tied0>, %vreg1<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg1 %vreg3<def> = COPY %vreg10<kill>; GR32:%vreg3,%vreg10 %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg3,%vreg2 CMP32ri8 %vreg3, 10, %EFLAGS<imp-def>; GR32:%vreg3 %vreg13<def> = COPY %vreg3<kill>; GR32:%vreg13,%vreg3 JL_1 <BB#1>, %EFLAGS<imp-use,kill> JMP_1 <BB#7> ``` and the final assembly does not have redundant copy: ``` .LBB0_6: addl %edi, %eax addl %ebx, %eax cmpl $10, %eax jl .LBB0_1 ``` Reviewers: qcolombet, MatzeB, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31821 llvm-svn: 306758	2017-06-29 23:11:24 +00:00
Simon Dardis	dede76f428	Revert "[mips] Fix multiprecision arithmetic." This reverts commit r305389. This broke chromium builds, so reverting while I investigate further. llvm-svn: 306741	2017-06-29 20:59:47 +00:00
Krzysztof Parzyszek	0089419417	[Hexagon] Keep all phi nodes when building DFG in addr-mode-opt The dead phis are needed for finding correct would-be reaching defs in register propagation. llvm-svn: 306690	2017-06-29 15:55:59 +00:00
Yonghong Song	5fbe01b12d	bpf: remove unnecessary truncate operation For networking-type bpf program, it often needs to access packet data. A context data structure is provided to the bpf programs with two fields: u32 data; u32 data_end; User can access these two fields with ctx->data and ctx->data_end. During program verification process, the kernel verifier modifies the bpf program with loading of actual pointer value from kernel data structure. r = ctx->data ===> r = actual data start ptr r = ctx->data_end ===> r = actual data end ptr A typical program accessing ctx->data like char data_ptr = (char )(long)ctx->data will result in a 32-bit load followed by a zero extension. Such an operation is combined into a single LDW in DAG combiner as bpf LDW does zero extension automatically. In cases like the below (which can be a result of global value numbering and partial redundancy elimination before insn selection): B1: u32 a = load-32-bit &ctx->data u64 pa = zext a ... B2: u32 b = load-32-bit &ctx->data u64 pb = zext b ... B3: u32 m = PHI(a, b) u64 pm = zext m In B3, "pm = zext m" cannot be removed, which although is legal from compiler perspective, will generate incorrect code after kernel verification. This patch recognizes this pattern and traces through PHI node to see whether the operand of "zext m" is defined with LDWs or not. If it is, the "zext m" itself can be removed. The patch also recognizes the pattern where the load and use of the load value not in the same basic block, where truncate operation may be removed as well. The patch handles 1-byte, 2-byte and 4-byte truncation. Two test cases are added to verify the transformation happens properly for the above code pattern. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 306685	2017-06-29 15:18:54 +00:00
Nikolai Bozhenov	1925594ea0	[NFC] Use stdin for some tests instead of positional argument. Summary: Otherwise unexpected matches with the path to the tests might happen. Reviewers: rengolin, spatel, efriedma, RKSimon Reviewed By: spatel Subscribers: n.bozhenov, javed.absar, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D32994 llvm-svn: 306684	2017-06-29 14:51:54 +00:00
Hiroshi Inoue	6989caa931	[PowerPC] fix potential verification error on __tls_get_addr This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed. Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP. Differential Revision: https://reviews.llvm.org/D34347 llvm-svn: 306678	2017-06-29 14:13:38 +00:00
Daniel Jasper	559aa75382	Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue" I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676	2017-06-29 13:58:24 +00:00
Igor Breger	0cddd34876	[GlobalISel][X86] Support vector type G_MERGE_VALUES selection. Summary: Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33958 llvm-svn: 306665	2017-06-29 12:08:28 +00:00
Simon Pilgrim	9a68e69c68	[X86][SSE] Dropped -mcpu from palignr tests Use triple and attribute only for consistency Add AVX tests as well llvm-svn: 306664	2017-06-29 11:13:39 +00:00
Simon Pilgrim	e2eacbfc23	[X86][SSE] Regenerate shuffle test with update_llc_test_checks.py llvm-svn: 306663	2017-06-29 11:11:37 +00:00
Simon Pilgrim	0afe97f480	[X86][SSE] Dropped -mcpu from vector shift tests Use triple and attribute only for consistency llvm-svn: 306662	2017-06-29 11:09:53 +00:00
Simon Pilgrim	91539ce2d3	[X86][SSE] Dropped -mcpu from zero insertion tests Use triple and attribute only for consistency llvm-svn: 306661	2017-06-29 11:08:11 +00:00
Michael Zuckerman	4bcb9c3349	[LLVM][X86][Goldmont] Adding new target-cpu: Goldmont [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658	2017-06-29 10:00:33 +00:00
Florian Hahn	08fdd040b5	[ARM] Add tGPRwithpc register class and use it for TBB/THH Summary: TBB and THH allow using a Thumb GPR or the PC as destination operand. A few machine verifier failures where due to those instructions not expecting PC as destination operand. Add -verify-machineinstrs to test/CodeGen/ARM/jump-table-tbh.ll to add test coverage even if expensive checks are disabled. Reviewers: MatzeB, t.p.northover, jmolloy Reviewed By: MatzeB Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34610 llvm-svn: 306654	2017-06-29 08:45:31 +00:00
Zvi Rackover	da3943d600	[X86] Adding shuffle tests demonstrating missed vcompress opportunities. NFC llvm-svn: 306646	2017-06-29 06:22:01 +00:00
Matt Arsenault	7c525903ef	AMDGPU: Remove SITypeRewriter This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603	2017-06-28 21:38:50 +00:00
Stanislav Mekhanoshin	a45584bebe	Fold fneg and fabs like multiplications Given no NaNs and no signed zeroes it folds: (fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X)) (fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X) Differential Revision: https://reviews.llvm.org/D34579 llvm-svn: 306592	2017-06-28 20:25:50 +00:00
Chih-Hung Hsieh	514dafdae3	Another test commit. llvm-svn: 306567	2017-06-28 17:12:51 +00:00
Krzysztof Parzyszek	3008594cd4	Missed a check for UndefVI in r306466 llvm-svn: 306553	2017-06-28 15:46:16 +00:00
Alexandros Lamprineas	c0432d86aa	[AArch64] AArch64CondBrTuningPass generates wrong branch instructions Some conditional branch instructions generated by this pass are checking the wrong condition code. The instructions TBZ and TBNZ are transformed into B.GE and B.LT instead of B.PL and B.MI respectively. They should only be checking the Negative bit. Differential Revision: https://reviews.llvm.org/D34743 llvm-svn: 306550	2017-06-28 15:09:11 +00:00
John Brawn	75d76e5e95	[ARM] Improve if-conversion for M-class CPUs without branch predictors The current heuristic in isProfitableToIfCvt assumes we have a branch predictor, and so gives the wrong answer in some cases when we don't. This patch adds a subtarget feature to indicate that a subtarget has no branch predictor, and changes the heuristic in isProfitableToiIfCvt when it's present. This gives a slight overall improvement in a set of embedded benchmarks on Cortex-M4 and Cortex-M33. Differential Revision: https://reviews.llvm.org/D34398 llvm-svn: 306547	2017-06-28 14:11:15 +00:00
Simon Pilgrim	48b30c3d55	[X86] Added BSWAP tests for illegal i64/i128/i256 'wide' scalar integers llvm-svn: 306546	2017-06-28 14:07:50 +00:00
Simon Pilgrim	4f5fcb03ad	[X86][SSE] Dropped -mcpu from vector bswap tests Use triple and attribute only for consistency llvm-svn: 306545	2017-06-28 13:59:15 +00:00
Michael Zuckerman	d0e663a697	[X86][LLVM][test]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess test. Exapnding the test to include AVX target. Adding base tast (to trunk) for Store strid=4 vf=32. llvm-svn: 306543	2017-06-28 13:42:45 +00:00
Teresa Johnson	538b8d25f0	Add zero-length check to memcpy/memset load store loop expansion Summary: I was testing using this expansion logic in other cases besides NVPTX, and found some runtime failures due to the lack of a check for a zero length memcpy/memset before the loop. There is already such a check in the memmove expansion code though. Reviewers: hfinkel Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D34707 llvm-svn: 306541	2017-06-28 13:07:37 +00:00
Igor Breger	86cf07a32e	[GlobalISel][X86] Test G_CONSTANT i32 0 TableGen'erated selection.NFC. llvm-svn: 306537	2017-06-28 12:43:21 +00:00
Igor Breger	d5b59cf914	[GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XOR Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: aymanmus Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34605 llvm-svn: 306533	2017-06-28 11:39:04 +00:00
Michael Zuckerman	f66840020c	Reverting commit 306414 on behalf of @gadi.haber llvm-svn: 306532	2017-06-28 11:23:31 +00:00
Simon Pilgrim	b9fa16bc53	[X86][AVX2] Dropped -mcpu from avx2 arithmetic/intrinsics tests Use triple and attribute only for consistency llvm-svn: 306531	2017-06-28 10:54:54 +00:00
Petar Jovanovic	7b3a38ec30	[X86] Correct dwarf unwind information in function epilogue CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529	2017-06-28 10:21:17 +00:00
Kristof Beyls	eecb353d0e	[ARM] Make -mcpu=generic schedule for an in-order core (Cortex-A8). The benchmarking summarized in http://lists.llvm.org/pipermail/llvm-dev/2017-May/113525.html showed this is beneficial for a wide range of cores. As is to be expected, quite a few small adaptations are needed to the regressions tests, as the difference in scheduling results in: - Quite a few small instruction schedule differences. - A few changes in register allocation decisions caused by different instruction schedules. - A few changes in IfConversion decisions, due to a difference in instruction schedule and/or the estimated cost of a branch mispredict. llvm-svn: 306514	2017-06-28 07:07:03 +00:00
Stanislav Mekhanoshin	d445455643	[AMDGPU] Add pattern for v_alignbit_b32 with immediate If immediate in shift is less than 32 we can use alignbit too. Differential Revision: https://reviews.llvm.org/D34729 llvm-svn: 306500	2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin	eb40733bf0	Allow to truncate left shift with non-constant shift amount That is pretty common for clang to produce code like (shl %x, (and %amt, 31)). In this situation we can still perform trunc (shl) into shl (trunc) conversion given the known value range of shift amount. Differential Revision: https://reviews.llvm.org/D34723 llvm-svn: 306499	2017-06-28 02:37:11 +00:00
Sanjay Patel	4b23fa0abf	[CGP] add specialization for memcmp expansion with only one basic block llvm-svn: 306485	2017-06-27 23:15:01 +00:00
Aditya Nandakumar	cca75d2406	[GISel]: Add G_FEXP, G_FEXP2 opcodes Also add IRTranslator support. https://reviews.llvm.org/D34710 llvm-svn: 306475	2017-06-27 22:19:32 +00:00
Sanjay Patel	70b36f193d	[CGP] eliminate a sub instruction in memcmp expansion As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471	2017-06-27 21:46:34 +00:00
Tim Northover	849fcca090	GlobalISel: verify that a COPY is trivial when created. Without this check, COPY instructions can actually be one of the generic casts in disguise. That's confusing and bad. At some point during ISel this restriction has to be relaxed since the fully selected instructions will usually use COPY for those purposes. Right now I think it's possible that relaxation occurs during RegBankSelect (hence the change there). I'm not convinced that's where it belongs long-term though. llvm-svn: 306470	2017-06-27 21:41:40 +00:00
Krzysztof Parzyszek	0b7688e6c0	Create a PHI value when merging with a known undef live-in Differential Revision: https://reviews.llvm.org/D34640 llvm-svn: 306466	2017-06-27 21:30:46 +00:00
Krzysztof Parzyszek	25173e4cba	[Hexagon] Use proper predicate register state when expanding PS_vselect llvm-svn: 306458	2017-06-27 19:59:46 +00:00
Stanislav Mekhanoshin	e8bf6c9629	[AMDGPU] Add 2 new alignbit patterns Differential Revision: https://reviews.llvm.org/D34655 llvm-svn: 306449	2017-06-27 19:10:47 +00:00
Stanislav Mekhanoshin	c9bd53ab59	[AMDGPU] Simplify setcc (sext from i1 b), -1\|0, cc Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446	2017-06-27 18:53:03 +00:00
Krzysztof Parzyszek	5ddd2e5899	[Hexagon] Update kills in hexagon-nvj even more properly than before Account for the fact that both, the feeder and the compare can be moved over instructions that kill registers. llvm-svn: 306443	2017-06-27 18:37:16 +00:00

... 4 5 6 7 8 ...

21168 Commits