Commit Graph

12454 Commits

Author SHA1 Message Date
Andrea Di Biagio b998eae2f2 [X86][BtVer2] Fix WriteFShuffle256 schedule write info.
This patch fixes the number of micro opcodes, and processor resource cycles for
the following AVX instructions:

vinsertf128rr/rm
vperm2f128rr/rm
vbroadcastf128

Tests have been regenerated using the usual scripts in the llvm/utils directory.

Differential Revision: https://reviews.llvm.org/D51492

llvm-svn: 341185
2018-08-31 08:30:47 +00:00
Craig Topper 7073f03f70 [X86] Add a -x86-experimental-vector-widening command line to vec_fp_to_int.ll.
llvm-svn: 341173
2018-08-31 07:05:38 +00:00
Craig Topper 2140a8e307 [X86] Add -x86-experimental-vector-widening-legalization run line to avx512-cvt.ll
This will cover the (v2i32 (setcc v2f32)) case in replaceNodeResults. That code shouldn't be needed at all in this mode. A future patch will skip it.

llvm-svn: 341171
2018-08-31 07:05:36 +00:00
Michael Berg 7b9e86445c [NFC] adding initial intersect test for Node to Instruction association
llvm-svn: 341138
2018-08-30 22:43:34 +00:00
Craig Topper b5de35a5ba [X86] Add -x86-experimental-vector-widening-legalization command lines to vector-idiv-v2i32.ll
If we're legalizing via widening already, then the type legalizer will scalarize the divs/rems as i32.

llvm-svn: 341108
2018-08-30 20:10:10 +00:00
Craig Topper 1a8c99e670 [X86] Weaken an overly aggressive assert.
This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded.

Fixes PR38771

llvm-svn: 341102
2018-08-30 19:35:38 +00:00
Craig Topper b7e14332ea [X86] Add kshift test cases for D51401. NFC
llvm-svn: 341088
2018-08-30 17:51:02 +00:00
Vladimir Stefanovic 7e58ebf6b8 Allow inconsistent offsets for 'noreturn' basic blocks when '-verify-cfiinstrs'
With r295105, some 'noreturn' blocks (those that don't return and have no
successors) may be merged.
If such blocks' predecessors have different outgoing offset or register, don't
report an error in CFIInstrInserter verify().

Thanks to Vlad Tsyrklevich for reporting the issue.

Differential Revision: https://reviews.llvm.org/D51161

llvm-svn: 341087
2018-08-30 17:31:38 +00:00
Roman Lebedev 26a1836757 [NFC][CodeGen][SelectionDAG] Tests for X % C == 0 codegen improvement.
Hacker's Delight 10-17: when C is constant,
the result of X % C == 0 can be computed more cheaply
without actually calculating the remainder.

The motivation is discussed here:
https://bugs.llvm.org/show_bug.cgi?id=35479.

Patch by: hermord (Dmytro Shynkevych)!

For https://reviews.llvm.org/D50222

llvm-svn: 341047
2018-08-30 09:32:21 +00:00
Carlos Alberto Enciso 06adfa1718 [DWARF] Missing location debug information with -O2.
Check that Machine CSE correctly handles during the transformation, the
debug location information for local variables.

Differential Revision: https://reviews.llvm.org/D50887

llvm-svn: 341025
2018-08-30 07:17:41 +00:00
Andrew V. Tischenko 62f7a3207b [X86] Improved sched model for X86 CMPXCHG* instructions.
Differential Revision: https://reviews.llvm.org/D50070 

llvm-svn: 341024
2018-08-30 06:26:00 +00:00
Craig Topper b7b353be60 [X86] Make Feature64Bit useful
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.

I've changed the assert to a report_fatal_error so it's not lost in Release builds.

The test updates are to fix things that tripped the new error.

Differential Revision: https://reviews.llvm.org/D51231

llvm-svn: 341022
2018-08-30 06:01:05 +00:00
Craig Topper 987ef2ddfd [X86] Update test command line to not use 64-bit mode on a 32-bit only athlon cpu.
llvm-svn: 341021
2018-08-30 06:01:03 +00:00
Craig Topper 2b3edb902d [X86] Remove powerpc cpu name and features from uwtables.ll
llvm-svn: 341020
2018-08-30 06:01:01 +00:00
Martin Storsjo 489993db94 [MinGW] [X86] Add stubs for references to data variables that might end up imported from a dll
Variables declared with the dllimport attribute are accessed via a
stub variable named __imp_<var>. In MinGW configurations, variables that
aren't declared with a dllimport attribute might still end up imported
from another DLL with runtime pseudo relocs.

For x86_64, this avoids the risk that the target is out of range
for a 32 bit PC relative reference, in case the target DLL is loaded
further than 4 GB from the reference. It also avoids having to make the
text section writable at runtime when doing the runtime fixups, which
makes it worthwhile to do for i386 as well.

Add stub variables for all dso local data references where a definition
of the variable isn't visible within the module, since the DLL data
autoimporting might make them imported even though they are marked as
dso local within LLVM.

Don't do this for variables that actually are defined within the same
module, since we then know for sure that it actually is dso local.

Don't do this for references to functions, since there's no need for
runtime pseudo relocations for autoimporting them; if a function from
a different DLL is called without the appropriate dllimport attribute,
the call just gets routed via a thunk instead.

GCC does something similar since 4.9 (when compiling with -mcmodel=medium
or large; from that version, medium is the default code model for x86_64
mingw), but only for x86_64.

Differential Revision: https://reviews.llvm.org/D51288

llvm-svn: 340942
2018-08-29 17:28:34 +00:00
Simon Pilgrim b49d5f3b53 [DAGCombiner] Add X / X -> 1 & X % X -> 0 folds
Adds more divrem folds to try and get in sync with InstructionSimplify

Differential Revision: https://reviews.llvm.org/D50636

llvm-svn: 340919
2018-08-29 11:30:16 +00:00
Simon Pilgrim 09cc7af85a [DAGCombiner] Add X / X -> 1 & X % X -> 0 folds (test tweaks)
Adjust missed test to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.

Differential Revision: https://reviews.llvm.org/D50636

llvm-svn: 340917
2018-08-29 11:23:59 +00:00
Simon Pilgrim 6d71c4cfe3 [DAGCombiner] Add X / X -> 1 & X % X -> 0 folds (test tweaks)
Adjust tests to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.

Differential Revision: https://reviews.llvm.org/D50636

llvm-svn: 340916
2018-08-29 11:18:14 +00:00
Simon Pilgrim 6b9bf7ecbc [X86][AVX] Prefer VPBLENDW+VPBLENDD to VPBLENDVB for v16i16 blend shuffles
Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead.

TODO: As discussed on the patch, we should investigate adding VPBLENDVB handling to target shuffle combining as well, that will allow us to extend this to VPBLENDW+VPBLENDW+VPBLENDD.

Differential Revision: https://reviews.llvm.org/D50074

llvm-svn: 340913
2018-08-29 10:51:08 +00:00
Craig Topper 9f42726cc7 [X86] Support v2i32 gather/scatter indices with -x86-experimental-vector-widening-legalization
Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp

Reviewers: RKSimon, spatel, efriedma

Reviewed By: efriedma

Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D51337

llvm-svn: 340891
2018-08-29 02:12:49 +00:00
Craig Topper 9401fd0ed2 [X86] Add intrinsics for KADD instructions
These are intrinsics for supporting kadd builtins in clang. These builtins are already in gcc to implement intrinsics from icc. Though they are missing from the Intel Intrinsics Guide.

This instruction adds two mask registers together as if they were scalar rather than a vXi1. We might be able to get away with a bitcast to scalar and a normal add instruction, but that would require DAG combine smarts in the backend to recoqnize add+bitcast. For now I'd prefer to go with the easiest implementation so we can get these builtins in to clang with good codegen.

Differential Revision: https://reviews.llvm.org/D51370

llvm-svn: 340869
2018-08-28 19:22:55 +00:00
Craig Topper f1c111431b [X86] Fix copy paste mistake in vector-idiv-v2i32.ll. Add missing test case.
Some of the test cases contained the same load twice instead of a different load.

llvm-svn: 340833
2018-08-28 15:24:12 +00:00
Simon Pilgrim af98587095 [X86][SSE] Improve variable scalar shift of vXi8 vectors (PR34694)
This patch creates the shift mask and actual shift using the vXi16 vector shift ops.

Differential Revision: https://reviews.llvm.org/D51263

llvm-svn: 340813
2018-08-28 10:37:29 +00:00
Simon Pilgrim f119e27d80 [X86][SSE] Avoid vector extraction/insertion for non-constant uniform shifts
As discussed on D51263, we're better off using byte shifts to clear the upper bits on pre-SSE41 hardware.

llvm-svn: 340810
2018-08-28 10:14:09 +00:00
Sanjay Patel fe0b5d215b [x86] add AVX runs to show more potential scalar->vector mov opportunities; NFC
llvm-svn: 340785
2018-08-27 22:29:06 +00:00
Craig Topper 171c6fe6cb [X86] Reverse the check prefixes in the test added in r340774.
The 32-bit and 64-bit checks were reversed.

llvm-svn: 340775
2018-08-27 21:34:37 +00:00
Craig Topper 76b18beef1 [X86] Add test cases to show current codegen of v2i32 div/rem in 32-bit and 64-bit modes
In particular this shows that we end up using libcalls in 32-bit mode even for division by constant.

llvm-svn: 340774
2018-08-27 21:13:07 +00:00
Sanjay Patel 7b6df50669 [x86] add tests for possibly avoiding scalar->vector move; NFC
llvm-svn: 340773
2018-08-27 20:21:33 +00:00
Craig Topper 4be11c0585 [X86] When lowering v32i8 MULHS/MULHU, shuffle after the PACKUS rather than before.
We're using a 256-bit PACKUS to do the truncation, but that instruction operates on 128-bit lanes. So previously we shuffled first to rearrange the lanes. But that requires 2 shuffles. Instead we can shuffle after the PACKUS using a single VPERMQ. This matches what our normal LowerTRUNCATE code does when it uses PACKUS.

Differential Revision: https://reviews.llvm.org/D51284

llvm-svn: 340757
2018-08-27 17:20:41 +00:00
Craig Topper fff90377fd [X86] Add support for matching paddus patterns where one of the vectors is a constant.
InstCombine mucks these up a bit. So we need to do some additional pattern matching to fix it. There are a still a few special cases not handled, but this covers the general case.

Differential Revision: https://reviews.llvm.org/D50952

llvm-svn: 340756
2018-08-27 17:20:38 +00:00
Aleksandr Urakov ff88f1763b [X86] Adding the test pointing to the fail case of D45653
Summary:
This commit adds the case of tail calling a sret function from a non-sret
function when both functions have the C calling convention.

llvm-svn: 340737
2018-08-27 11:56:32 +00:00
Aleksandr Urakov 6f7fef7865 [NFC][X86] Fix `sibcall.ll` formatting
Summary:
Remove unnecessary lines from `sibcall.ll` and rename labels according
to @RKSimon's recommendations in the D45653 conversation.

llvm-svn: 340735
2018-08-27 11:25:38 +00:00
Craig Topper 128915f4ae [X86] Add FeatureCMOV explicitly to all CPUs that support it. Remove FeatureCMOV implication from Feature64Bit and FeatureSSE1
Summary:
Previously most CPUs inherited cmov support through Feature64Bit(or FeatureCMPXCHG16HB implying Feature64Bit) or FeatureSSE1.

This has the surprising side effect that -mattr=-cmov causes an assert to fire in 64-bit mode because it clears the Feature64Bit. Or in 32-bit mode, -mattr=-cmov disables any sse/avx features which seems surprising.

This patch removes the implication and instead updates hasCMOV in X86Subtarget to check SSE1 or is64Bit in addition to the regular cmov flag. This should keep most things working the way they did before. I don't believe there is a way to specific "-cmov" directly from clang so this should only effect our lower level tools.

This does stop -mattr=cx16(cmpxchg16b) from implying cmov is enabled via the 64bit flag as you can see from one of the changed tests. But that was a 32-bit test so I don't know why it enabled cx16 anyway.

For the other test I had to add -sse to override the new sse check in hasCMOV.

Reviewers: RKSimon, DavidKreitzer, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D51228

llvm-svn: 340707
2018-08-26 18:29:33 +00:00
Craig Topper b68a78b9ac [X86] Add FeatureCMOV to athlon and athlon-tbird cpus.
Summary: This matches gcc and one cpuid dump I found online. Given that these are considered 7th generation x86 CPU it seems likely they support cmov since cmov was added by Intel in their 6th generation.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51264

llvm-svn: 340706
2018-08-26 18:29:27 +00:00
Sanjay Patel 113cac3b15 [SelectionDAG][x86] turn insertelement into undef with variable index into splat
I noticed this along with the patterns in D51125, but when the index is variable, 
we don't convert insertelement into a build_vector.

For x86, that means these get expanded at legalization time into the loading/spilling 
code that we see in the tests. I think it's always better to avoid going to memory on 
these, and we get the optimal 'broadcast' if it's available.

I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have 
regression tests that would be affected (although I did not check what would happen in 
those cases). In the most basic cases shown here, AArch64 would probably do much 
better with a splat.

Differential Revision: https://reviews.llvm.org/D51186

llvm-svn: 340705
2018-08-26 18:20:41 +00:00
Craig Topper 7ef643ef17 [X86] Add test cases for D50952, paddus patterns involving constants. NFC
llvm-svn: 340694
2018-08-26 00:22:07 +00:00
Craig Topper ebec2793d1 [X86] Replace support for vXi32 SMUL_LOHI/UMUL_LOHI with MULHS/MULHU support instead.
Summary:
The only time vector SMUL_LOHI/UMUL_LOHI nodes are created is during division/remainder lowering. If its created before op legalization, generic DAGCombine immediately turns that SMUL_LOHI/UMUL_LOHI into a MULHS/MULHU since only the upper half is used. That node will stick around through vector op legalization and will be turned back into UMUL_LOHI/SMUL_LOHI during op legalization. It will then be custom lowered by the X86 backend. Due to this two step lowering the vector shuffles created by the custom lowering get legalized after their inputs rather than before. This prevents the shuffles from being combined with any build_vector of constants.

This patch uses changes vXi32 to use MULHS/MULHU instead. This is what the later DAG combine did anyway. But by skipping the change back to UMUL_LOHI/SMUL_LOHI we lower it before any constant BUILD_VECTORS. This allows the vector_shuffle creation to constant fold with the build_vectors. This accounts for the test changes here.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51254

llvm-svn: 340690
2018-08-25 18:01:24 +00:00
Sanjay Patel 8a84c747d2 [x86] try harder to use broadcast to load a scalar into vector reg
This is a preliminary step for a preliminary step for D50992. 
I noticed that x86 often misses chances to load a scalar directly 
into a vector register.

So this patch is just allowing more of those cases to match a 
broadcast op in lowerBuildVectorAsBroadcast(). The old code comment 
said it doesn't make sense to use a broadcast when we're loading a 
single element and everything else is undef, but I think that's the 
best case in the improved tests in insert-loaded-scalar.ll. We avoid 
scalar-to-vector-register move and/or less efficient shuffling.

Note that there are some existing types that were already producing 
a broadcast, but that happens semi-accidentally. Ie, it's not 
happening as part of lowerBuildVectorAsBroadcast(). The build vector 
gets expanded into load + shuffle, and then shuffle lowering produces 
the broadcast.

Description of the other test diffs:
1. avx-basic.ll - replacing load+shufle is a win.
2. sse3-avx-addsub-2.ll - vmovddup vs. vbroadcastss is neutral
3. sse41.ll - don't care - we convert that intrinsic to generic IR now, so this test is deprecated
4. vector-shuffle-128-v8.ll / vector-shuffle-256-v16.ll - pshufb alternatives with an extra instruction are not obviously bad

Differential Revision: https://reviews.llvm.org/D51125

llvm-svn: 340685
2018-08-25 14:56:05 +00:00
Simon Pilgrim eb6a3cbb28 [X86] Make requested test changes from D50636
The tests were relying on X / X -> 1 and X % X -> 0 combines not happening in the DAG.

llvm-svn: 340682
2018-08-25 14:16:03 +00:00
Bjorn Pettersson 8483004723 [LiveDebugVariables] Avoid faulty addDefsFromCopies in computeIntervals
Summary:
When computeIntervals is looking through COPY instruction to
extend the location mapping for a debug variable it did not
handle subregisters correctly.

For example
    DBG_VALUE debug-use %0.sub_8bit_hi, ...
    %1:gr16 = COPY %0
was transformed into
    DBG_VALUE debug-use %0.sub_8bit_hi, ...
    %1:gr16 = COPY %0
    DBG_VALUE debug-use %1, ...
So the subregister index was missing in the added DBG_VALUE.

As long as the subreg refered to the least significant bits
of the superreg, then I guess we could get the correct
result in a debugger even when referring to the superreg.
But as in the example above when the subreg refers to other
parts of the superreg, then debuginfo would be incorrect.

I'm not sure exactly how to fix this properly, so this patch
just avoids looking through the COPY when there is a subreg
involved (for more info, see the FIXME added in the code).

Reviewers: rnk, aprantl

Reviewed By: aprantl

Subscribers: JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D50788

llvm-svn: 340679
2018-08-25 10:02:03 +00:00
Peter Collingbourne 3f792230cb CodeGen: Add two more conditions for adding symbols to the address-significance table.
Firstly, require the symbol to be used within the module. If a
symbol is unused within a module, then by definition it cannot be
address-significant within that module. This condition is useful on all
platforms because it could make symbol tables smaller -- without this
change, emitting an address-significance table could cause otherwise
unused undefined symbols to be added to the object file.

But this change is necessary with COFF specifically in order to
preserve the property that an unreferenced undefined symbol in an IR
module does not result in a link failure. This is already the case for
ELF because ELF linkers only reject links with unresolved symbols if
there is a relocation to that symbol, but COFF linkers require all
undefined symbols to be resolved regardless of relocations. So if
a module contains an unreferenced undefined symbol, we need to make
sure not to add it to the address-significance table (and thus the
symbol table) in case it doesn't end up resolved at link time.

Secondly, do not add dllimport symbols to the table. These symbols
won't be able to be resolved because their definitions live in another
module and are accessed via the IAT, and the address-significance
table has no effect on other modules anyway. It wouldn't make sense
to add the IAT entry symbol to the address-significance table either
because the IAT entry isn't address-significant -- the generated code
never takes its address.

Differential Revision: https://reviews.llvm.org/D51199

llvm-svn: 340648
2018-08-24 20:37:09 +00:00
Stefan Pintilie 892fc6b7f2 [Exception Handling] Unwind tables are required for all functions that have an EH personality.
This patch is for defect:
https://bugs.llvm.org/show_bug.cgi?id=32611

Functions may require unwind tables even if they are marked with the attribute
nounwind. Any function with an EH personality may require an unwind table.

Differential Revision: https://reviews.llvm.org/D50987

llvm-svn: 340641
2018-08-24 19:38:29 +00:00
Craig Topper 4058e29e7d [X86] Teach combineLoopMAddPattern to handle cases where there is no loop and the add has two multiply inputs
Differential Revision: https://reviews.llvm.org/D50868

llvm-svn: 340631
2018-08-24 18:05:04 +00:00
Craig Topper 3c78622d64 [X86] Add test case for D50868. NFC
llvm-svn: 340630
2018-08-24 18:05:02 +00:00
Stefan Pintilie 7cb44f2470 Revert "[Exception Handling] Unwind tables are required for all functions that have an EH personality."
This reverts commit rL340614.
Previous commit broke some llvm-cfi-verify tests.

llvm-svn: 340625
2018-08-24 17:27:35 +00:00
Stefan Pintilie 36f31617d3 [Exception Handling] Unwind tables are required for all functions that have an EH personality.
This patch is for defect:
https://bugs.llvm.org/show_bug.cgi?id=32611

Functions may require unwind tables even if they are marked with the attribute
nounwind. Any function with an EH personality may require an unwind table.

Differential Revision: https://reviews.llvm.org/D50987

llvm-svn: 340614
2018-08-24 15:51:47 +00:00
Sanjay Patel 851e02e52e [x86] move/add tests for insertelement with variable index; NFC
The variable index pattern is different than the constant index
cases as shown in D51125. We might want to splat regardless of
whether the scalar is loaded from memory or transferred from GPR.

llvm-svn: 340565
2018-08-23 18:38:40 +00:00
Chandler Carruth ae0cafece8 [x86/retpoline] Split the LLVM concept of retpolines into separate
subtarget features for indirect calls and indirect branches.

This is in preparation for enabling *only* the call retpolines when
using speculative load hardening.

I've continued to use subtarget features for now as they continue to
seem the best fit given the lack of other retpoline like constructs so
far.

The LLVM side is pretty simple. I'd like to eventually get rid of the
old feature, but not sure what backwards compatibility issues that will
cause.

This does remove the "implies" from requesting an external thunk. This
always seemed somewhat questionable and is now clearly not desirable --
you specify a thunk the same way no matter which set of things are
getting retpolines.

I really want to keep this nicely isolated from end users and just an
LLVM implementation detail, so I've moved the `-mretpoline` flag in
Clang to no longer rely on a specific subtarget feature by that name and
instead to be directly handled. In some ways this is simpler, but in
order to preserve existing behavior I've had to add some fallback code
so that users who relied on merely passing -mretpoline-external-thunk
continue to get the same behavior. We should eventually remove this
I suspect (we have never tested that it works!) but I've not done that
in this patch.

Differential Revision: https://reviews.llvm.org/D51150

llvm-svn: 340515
2018-08-23 06:06:38 +00:00
Craig Topper cf9df99d79 [X86] Teach combineLoopSADPattern to handle cases where there is no loop and the add has two absolute difference inputs
Previously we asumed a vector reduction add is part of a loop and one of the input is a phi. But the code in SelectionDAGBuilder that sets vector reduction flag handles more cases than that. It just requires that the use chain ends in a horizontal reduction. And there are no other uses. This means it can handle unrolled reduction loops.

If the initial value of the reduction was 0, an unrolled loop would begin with a vector reduction add that has two sad inputs. Previously we would only transform one side of the add, but for this case we need to transform both sides.

I've created a lambda to reuse some of the code for both sides. And fixed the variables names to remove reference to "phi".

Differential Revision: https://reviews.llvm.org/D50817

llvm-svn: 340478
2018-08-22 23:19:01 +00:00
Craig Topper 903ef6a03f [X86] Add test cases for D50817. NFC
llvm-svn: 340477
2018-08-22 23:18:58 +00:00
Sanjay Patel ed1b9695ee [SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527)
This solves the motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38527

If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls, 
but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll 
the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge 
that some of the scalar elements are undef.

Differential Revision: https://reviews.llvm.org/D50791

llvm-svn: 340469
2018-08-22 22:52:05 +00:00
Craig Topper 538f8ab438 [X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware
Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can.

This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct.

To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order.

Differential Revision: https://reviews.llvm.org/D48789

llvm-svn: 340441
2018-08-22 19:39:09 +00:00
Sanjay Patel b5686c4e4e [x86] add tests for load scalar + insertelement; NFC
llvm-svn: 340425
2018-08-22 17:46:28 +00:00
Simon Pilgrim ffdfe45645 [X86][SSE] LowerMULH vXi8 - use SSE shifts directly.
We know these vXi16 extended cases are legal constant splat shifts.

llvm-svn: 340414
2018-08-22 15:37:11 +00:00
Simon Pilgrim b89a4f85bf [X86][SSE] Add sdiv test case from PR38658
llvm-svn: 340393
2018-08-22 09:47:12 +00:00
Bjorn Pettersson e06321382b [RegisterCoalescer] Use substPhysReg in reMaterializeTrivialDef
Summary:
When RegisterCoalescer::reMaterializeTrivialDef is substituting
a register use in a DBG_VALUE instruction, and the old register
is a subreg, and the new register is a physical register,
then we need to use substPhysReg in order to extract the correct
subreg.

Reviewers: wmi, aprantl

Reviewed By: wmi

Subscribers: hiraditya, MatzeB, qcolombet, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D50844

llvm-svn: 340326
2018-08-21 19:47:32 +00:00
Simon Pilgrim 9848e0c9ac [X86][SSE] Add non-uniform udiv test that is mostly divide by 1.
The test demonstrates over-complicated codegen for a udiv that only has one divisor that doesn't equal 1. This should have allowed the codegen to be a lot simpler (uniform shifts etc.) but only the SSE2 manages to make use of this......

llvm-svn: 340313
2018-08-21 18:02:28 +00:00
Craig Topper b172b8884a [BypassSlowDivision] Teach bypass slow division not to interfere with div by constant where constants have been constant hoisted, but not moved from their basic block
DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block.

Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all.

This fixes the case from PR38649.

Differential Revision: https://reviews.llvm.org/D51000

llvm-svn: 340303
2018-08-21 17:15:33 +00:00
Simon Pilgrim 43cf2c20ab [X86] Add SSE2 and XOP udiv combine tests
llvm-svn: 340282
2018-08-21 15:21:45 +00:00
Simon Pilgrim 8e15b43092 [X86] Add SSE2 sdiv combine tests
llvm-svn: 340264
2018-08-21 10:44:06 +00:00
Sam Parker 597811e7a7 [DAGCombiner] Reduce load widths of shifted masks
During combining, ReduceLoadWdith is used to combine AND nodes that
mask loads into narrow loads. This patch allows the mask to be a
shifted constant. This results in a narrow load which is then left
shifted to compensate for the new offset.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 340261
2018-08-21 10:26:59 +00:00
Simon Pilgrim 72b324de4d [TargetLowering] Add BuildSDiv support for division by one or negone.
This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1.

llvm-svn: 340260
2018-08-21 10:20:36 +00:00
Bjorn Pettersson 880f291577 [RegisterCoalescer] Do not assert when trying to remat dead values
Summary:
RegisterCoalescer::reMaterializeTrivialDef used to assert that
the input register was live in. But as shown by the new
coalesce-dead-lanes.mir test case that seems to be a valid
scenario. We now return false instead of the assert, simply
avoiding to remat the dead def.

Normally a COPY of an undef value is eliminated by
eliminateUndefCopy(). Although we only do that when the
destination isn't a physical register. So the situation
above should be limited to the case when we copy an undef
value to a physical register.

Reviewers: kparzysz, wmi, tpr

Reviewed By: kparzysz

Subscribers: MatzeB, qcolombet, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D50842

llvm-svn: 340255
2018-08-21 07:49:05 +00:00
Craig Topper 9c57ba0dc3 [X86] Add test command line to expose PR38649.
Bypass slow division and constant hoisting are conspiring to break div+rem of large constants.

llvm-svn: 340217
2018-08-20 21:51:35 +00:00
Craig Topper 210ccfe3db [X86] Prevent lowerVectorShuffleByMerging128BitLanes from creating cycles
Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG.

This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle.

Fixes PR38639

Differential Revision: https://reviews.llvm.org/D50981

llvm-svn: 340214
2018-08-20 21:08:35 +00:00
Craig Topper 7dcb2c4b0a [X86] Teach combineTruncatedArithmetic to handle some cases of ISD::SUB
We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector.

Differential Revision: https://reviews.llvm.org/D50878

llvm-svn: 340212
2018-08-20 20:57:35 +00:00
Craig Topper 08e7e04998 [X86] Pre-commit test cases for D50878.
llvm-svn: 340211
2018-08-20 20:57:32 +00:00
Cameron McInally 94b9029be9 [FPEnv] Support constrained FREM intrinsic
Differential Revision: https://reviews.llvm.org/D50975

llvm-svn: 340201
2018-08-20 19:28:56 +00:00
Simon Pilgrim 6ac905926f [TargetLowering] Disable BuildSDiv division by one or negone.
Fuzz tests have detected an issue, currently working on a fix.

llvm-svn: 340195
2018-08-20 18:23:54 +00:00
Simon Pilgrim 5b78c9d58d [SelectionDAG] Add partial sign-bit support to ComputeNumSignBits for BITCAST nodes
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.

Handle the case where the sign bit extends to only part of the small elements.

llvm-svn: 340169
2018-08-20 13:05:48 +00:00
Simon Pilgrim 11bec5b80c [X86][SSE] Fix PACKSS bitcast test from rL340166
We need the signbits to extends to lower 16-bits of the even elements

llvm-svn: 340167
2018-08-20 11:47:15 +00:00
Simon Pilgrim cee9c64838 [X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle a partial sign bits extension through a bitcast
llvm-svn: 340166
2018-08-20 11:10:12 +00:00
Simon Pilgrim 686090a45f [X86] Drop unnecessary exact qualifier from packss test
llvm-svn: 340165
2018-08-20 11:01:51 +00:00
Simon Pilgrim 5b936ec89e [SelectionDAG] Add basic demanded elements support to ComputeNumSignBits for BITCAST nodes
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.

The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements.

llvm-svn: 340143
2018-08-19 17:47:50 +00:00
Simon Pilgrim 0fd72ab44f [X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle demanded elts through a bitcast
llvm-svn: 340139
2018-08-19 16:01:47 +00:00
Craig Topper 803912ea57 [X86] Fix an issue in the matching for ADDUS.
We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add.

This rewrites the canonicalization to just be based on the condition code.

llvm-svn: 340134
2018-08-19 04:26:31 +00:00
Craig Topper a85d7e927b [X86] Add a test case showing an issue in our addusw pattern matching.
We are unable to handle a normal add followed by a saturing add with certain operand orders on the icmp.

llvm-svn: 340133
2018-08-19 04:26:29 +00:00
Craig Topper 40c9559b74 [X86] Add support for using 512-bit PSUBUS to combineSelect.
The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements.

While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular.

llvm-svn: 340128
2018-08-18 18:51:03 +00:00
Craig Topper b40a1d5f84 [X86] Add test cases to show missed opportunities to use 512-bit PSUBUS.
llvm-svn: 340127
2018-08-18 18:50:59 +00:00
Craig Topper 911efbb926 [X86] Add a signed test case for PR38622. Use nounwind to reduce the output on the unsigned test case.
llvm-svn: 340121
2018-08-18 06:00:16 +00:00
Craig Topper cc5dbbf759 [DAGCombiner] Allow divide by constant optimization on opaque constants.
Summary:
I believe this restores the behavior we had before r339147.

Fixes PR38622.

Reviewers: RKSimon, chandlerc, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50936

llvm-svn: 340120
2018-08-18 05:52:42 +00:00
Simon Pilgrim 2f48122cc9 [X86][SSE] Lower constant vXi8 ISD::SRL/ISD::SRA using PMULLW
Extending the concept introduced in D49562, this patch lowers constant vXi8 ISD::SRL/ISD::SRA by zero/sign extending to vXi16 and using PMULLW and then truncating the high 8 bits of the result.

Differential Revision: https://reviews.llvm.org/D50781

llvm-svn: 340062
2018-08-17 18:03:11 +00:00
Francis Visoiu Mistrih f006b491bd [x86] Fix test breaking on Darwin after r339962
* -march=x86-64 -> -mtriple=x86_64-unknown-linux to avoid _ prefixes to
symbols
* add -start-before to avoid running the whole codegen on the IR. I
assumed it is meant to be running after X86SpeculativeLoadHardening.

llvm-svn: 340034
2018-08-17 14:47:01 +00:00
Francis Visoiu Mistrih 8bff832534 [X86] Fix liveness information when expanding X86::EH_SjLj_LongJmp64
test/CodeGen/X86/shadow-stack.ll has the following machine verifier
errors:

```
*** Bad machine code: Using a killed virtual register ***
- function:    bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg
- operand 1:   killed %2:gr64

*** Bad machine code: Using a killed virtual register ***
- function:    bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg
- operand 1:   killed %2:gr64

*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function:    bar
- basic block: %bb.2 entry (0x7fdc818574f8)
Virtual register %2 is used after the block.
```

The fix here is to only copy the machine operand's register without the
kill flags for all the instructions except the very last one of the
sequence.

I had to insert dummy PHIs in the test case to force the NoPHI function
property to be set to false. More on this here: https://llvm.org/PR38439

Differential Revision: https://reviews.llvm.org/D50260

llvm-svn: 340033
2018-08-17 14:46:56 +00:00
Simon Pilgrim 03e57521c0 [DAGCombiner] extractShiftForRotate - fix out of range shift issue
Don't just check for negative shift amounts.

Fixes OSS Fuzz #9935
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935

llvm-svn: 340015
2018-08-17 12:25:18 +00:00
Simon Pilgrim 5113b48798 [DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) folding
Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly.

Differential Revision: https://reviews.llvm.org/D35722

llvm-svn: 340010
2018-08-17 10:52:49 +00:00
Chandler Carruth 75ca6be1c1 [x86/MIR] Implement support for pre- and post-instruction symbols, as
well as MIR parsing support for `MCSymbol` `MachineOperand`s.

The only real way to test pre- and post-instruction symbol support is to
use them in operands, so I ended up implementing that within the patch
as well. I can split out the operand support if folks really want but it
doesn't really seem worth it.

The functional implementation of pre- and post-instruction symbols is
now *completely trivial*. Two tiny bits of code in the (misnamed)
AsmPrinter. It should be completely target independent as well. We emit
these exactly the same way as we emit basic block labels. Most of the
code here is to give full dumping, MIR printing, and MIR parsing support
so that we can write useful tests.

The MIR parsing of MC symbol operands still isn't 100%, as it forces the
symbols to be non-temporary and non-local symbols with names. However,
those names often can encode most (if not all) of the special semantics
desired, and unnamed symbols seem especially annoying to serialize and
de-serialize. While this isn't perfect or full support, it seems plenty
to write tests that exercise usage of these kinds of operands.

The MIR support for pre-and post-instruction symbols was quite
straightforward. I chose to print them out in an as-if-operand syntax
similar to debug locations as this seemed the cleanest way and let me
use nice introducer tokens rather than inventing more magic punctuation
like we use for memoperands.

However, supporting MIR-based parsing of these symbols caused me to
change the design of the symbol support to allow setting arbitrary
symbols. Without this, I don't see any reasonable way to test things
with MIR.

Differential Revision: https://reviews.llvm.org/D50833

llvm-svn: 339962
2018-08-16 23:11:05 +00:00
Craig Topper 883ff69c93 [DAGCombiner] Don't reassociate operations that have the vector reduction flag set.
When nodes are reassociated the vector-reduction flag gets lost.

The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass.

I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set.

Differential Revision: https://reviews.llvm.org/D50827

llvm-svn: 339946
2018-08-16 21:54:05 +00:00
Craig Topper bde2b43cb3 [X86] In EFLAGS copy pass, don't emit EXTRACT_SUBREG instructions since we're after peephole
Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up.

To fix this, the eflags pass now emits a COPY with a subreg input.

I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs

Differential Revision: https://reviews.llvm.org/D50656

llvm-svn: 339945
2018-08-16 21:54:02 +00:00
Craig Topper 3dfc5af178 [X86] Pre-commit test case for D50827.
llvm-svn: 339926
2018-08-16 19:27:43 +00:00
Eli Friedman 73e8a784e6 [SelectionDAG] Improve the legalisation lowering of UMULO.
There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.

This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.

Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.

Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.

+-------+-----------+-----------+-------------+-------------+
| Arch  | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
|   X64 |     -     |     -     |    ~0.5     |    ~0.64    |
|  i686 |   ~0.5    |   ~0.6666 |    ~0.05    |    ~0.9     |
| armv7 |     -     |   ~0.75   |      -      |    ~1.4     |
+-------+-----------+-----------+-------------+-------------+

Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.

All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.

The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.

Patch by Simonas Kazlauskas!

Differential Revision: https://reviews.llvm.org/D50310

llvm-svn: 339922
2018-08-16 18:39:39 +00:00
Simon Pilgrim 87d0039a45 [TargetLowering] Add support for non-uniform vectors to BuildSDIV
This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators.

This is the last patch necessary to close PR36545

Differential Revision: https://reviews.llvm.org/D50765

llvm-svn: 339908
2018-08-16 17:44:33 +00:00
Simon Pilgrim 8b9e545477 [X86][SSE] Add sdiv by nonuniform constant vector test containing -1/+1 and all-bits style constants
llvm-svn: 339901
2018-08-16 17:07:41 +00:00
Craig Topper 9c1d9fdeaa [X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead.
llvm-svn: 339842
2018-08-16 06:20:24 +00:00
Craig Topper 9d6983c9fd [X86] Remove the unused masked 128 and 256-bit masked padds/psubs intrinsics.
Still need to remove masking from the 512-bit versions.

llvm-svn: 339841
2018-08-16 06:20:22 +00:00
Craig Topper 054b8cce2d [X86] Correct some bad FileCheck prefixes in tests. Add test cases for v64i8 padd/psub saturation intrinsics.
For some reason we had the 128/256-bit tests, but no the 512-bit tests.

llvm-svn: 339840
2018-08-16 06:20:19 +00:00
Chandler Carruth 00c35c7794 [x86] Actually initialize the SLH pass with the x86 backend and use
a shorter name ('x86-slh') for the internal flags and pass name.

Without this, you can't use the -stop-after or -stop-before
infrastructure. I seem to have just missed this when originally adding
the pass.

The shorter name solves two problems. First, the flag names were ...
really long and hard to type/manage. Second, the pass name can't be the
exact same as the flag name used to enable this, and there are already
some users of that flag name so I'm avoiding changing it unnecessarily.

llvm-svn: 339836
2018-08-16 01:22:19 +00:00
Craig Topper 08e082619a [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation
To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result.

This fixes PR35833

Differential Revison: https://reviews.llvm.org/D41794

llvm-svn: 339818
2018-08-15 21:21:52 +00:00
Sanjay Patel 712d42f53d [x86] add fabs test for vector intrinsic to potential libcall bug; NFC
This is a negative test for x86 because it has custom lowering for fabs.

llvm-svn: 339791
2018-08-15 16:56:09 +00:00
Sanjay Patel f9afee479f [x86] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC
llvm-svn: 339790
2018-08-15 16:35:50 +00:00