Commit Graph

19442 Commits

Author SHA1 Message Date
Craig Topper 7bb433c87b [X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem.
We can use a MOVSX16 here then rely on FixupBWInst to change to
MOVSX32 if the upper bits are dead. With a special case to
not promote if it could be turned into CBW.

Then we can rely on X86MCInstLower to turn the MOVSX into CBW
very late if register allocation worked out.

Using MOVSX gives an opportunity to use the MOVSX as a both a
copy and a sign extend since the input and output register aren't
tied together.

Differential Revision: https://reviews.llvm.org/D67192

llvm-svn: 371243
2019-09-06 19:17:02 +00:00
Craig Topper 22b35c4291 [X86] Use MOVZX16rr8/MOVZXrm8 when extending input for i8 udivrem.
We can rely on X86FixupBWInsts to turn these into MOVZX32. This
simplifies a follow up commit to use MOVSX for i8 sdivrem with
a late optimization to use CBW when register allocation works out.

llvm-svn: 371242
2019-09-06 19:15:04 +00:00
Craig Topper 0364d89b6d [X86] Teach FixupBWInsts to turn MOVSX16rr8/MOVZX16rr8/MOVSX16rm8/MOVZX16rm8 into their 32-bit dest equivalents when the upper part of the register is dead.
llvm-svn: 371240
2019-09-06 19:14:49 +00:00
Guillaume Chatelet ad1cea0dda [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67267

llvm-svn: 371212
2019-09-06 15:03:49 +00:00
Guillaume Chatelet 9fcf066d0c [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67278

llvm-svn: 371210
2019-09-06 14:51:15 +00:00
Craig Topper 7739fbc9c3 [X86] Fix bad indentation. NFC
llvm-svn: 371167
2019-09-06 05:50:46 +00:00
Craig Topper 0fde412140 [X86] Enable BuildSDIVPow2 for i16.
We're able to use a 32-bit ADD and CMOV here and should work
well with our other i16->i32 promotion optimizations.

llvm-svn: 371107
2019-09-05 18:49:52 +00:00
Craig Topper b8d6ba3ca2 [X86] Override BuildSDIVPow2 for X86.
As noted in PR43197, we can use test+add+cmov+sra to implement
signed division by a power of 2.

This is based off the similar version in AArch64, but I've
adjusted it to use target independent nodes where AArch64 uses
target specific CMP and CSEL nodes. I've also blocked INT_MIN
as the transform isn't valid for that.

I've limited this to i32 and i64 on 64-bit targets for now and only
when CMOV is supported. i8 and i16 need further investigation to be
sure they get promoted to i32 well.

I adjusted a few tests to enable cmov to demonstrate the new
codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode
without cmov to avoid perturbing the scenario that is being
set up there.

Differential Revision: https://reviews.llvm.org/D67087

llvm-svn: 371104
2019-09-05 18:15:07 +00:00
Sanjay Patel 10412a69f9 [x86] fix horizontal math bug exposed by improved demanded elements analysis (PR43225)
https://bugs.llvm.org/show_bug.cgi?id=43225

llvm-svn: 371095
2019-09-05 17:28:17 +00:00
Craig Topper 97aa42f5df [X86] Add a FIXME about why the CWD/CDQ/CQO have a bogus implicit def of the A register. NFC
The instructions copy the sign bit of the A register to every bit
of the D register. But they don't write to the A register.

llvm-svn: 371094
2019-09-05 17:24:34 +00:00
Craig Topper a5508163ad [X86] Fix stale comment. NFC
We aren't checking for a concat here. We're just always splitting
256-bit stores.

llvm-svn: 371092
2019-09-05 17:24:15 +00:00
Simon Pilgrim 29361c704d [X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads (PR43227)
As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset.

Until we can properly handle this, we must bail from EltsFromConsecutiveLoads.

llvm-svn: 371078
2019-09-05 15:07:07 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Simon Pilgrim 082750fe68 [X86] X86SpeculativeLoadHardeningPass::canHardenRegister - fix out of bounds warning.
Fixes clang static-analyzer warning.

llvm-svn: 371050
2019-09-05 10:26:38 +00:00
Simon Pilgrim 67991a59cb [X86] X86InstrInfo::optimizeCompareInstr - fix potential null dereference.
Fixes clang static-analyzer warning.

Technically the MachineInstr *Sub might still be null if we're comparing zero (IsCmpZero == true), although this probably won't happen as SrcReg2 is probably == 0.

llvm-svn: 371047
2019-09-05 10:18:24 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
Reid Kleckner 3fa07dee94 Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn
This reverts r370525 (git commit 0bb1630685)
Also reverts r370543 (git commit 185ddc08ee)

The approach I took only works for functions marked `noreturn`. In
general, a call that is not known to be noreturn may be followed by
unreachable for other reasons. For example, there could be multiple call
sites to a function that throws sometimes, and at some call sites, it is
known to always throw, so it is followed by unreachable. We need to
insert an `int3` in these cases to pacify the Windows unwinder.

I think this probably deserves its own standalone, Win64-only fixup pass
that runs after block placement. Implementing that will take some time,
so let's revert to TrapUnreachable in the mean time.

llvm-svn: 370829
2019-09-03 22:27:27 +00:00
Amara Emerson fbaf425b79 [GlobalISel][CallLowering] Add support for splitting types according to calling conventions.
On AArch64, s128 types have to be split into s64 GPRs when passed as arguments.
This change adds the generic support in call lowering for dealing with multiple
registers, for incoming and outgoing args.

Support for splitting for return types not yet implemented.

Differential Revision: https://reviews.llvm.org/D66180

llvm-svn: 370822
2019-09-03 21:42:28 +00:00
Simon Pilgrim 99525bbe49 [X86] Merge 2 consecutive HasInt256 branches. NFCI.
llvm-svn: 370761
2019-09-03 14:39:06 +00:00
Craig Topper b915109043 [X86] Simplify the setOperationAction handling for fp_to_uint by improving the Custom handler a bit.
This merges the 32-bit and 64-bit mode code to just use Custom
for both i32 and i64. We already had most of the handling in
the custom handling due to the AVX512 having legal fp_to_uint.
Just needed to add the i32->i64 promotion handling. Refactor
the fp_to_uint code in the custom handler to simplify the
number of times we check things.

Tweak cost model tables to match the default handling we were
getting due to Expand before.

llvm-svn: 370700
2019-09-03 05:57:22 +00:00
Craig Topper 9dc8c448ed [X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target.
Use Custom lowering instead. Fall back to default expansion only
when the scalar FP type belongs in an XMM register. This improves
lowering for i32 to fp80, and also i32 to double on SSE1 only.

llvm-svn: 370699
2019-09-03 05:57:18 +00:00
Craig Topper dcecc7ea46 [X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets.
Reuse the same code to promote all i32 uint_to_fp on 64-bit targets
to simplify the X86ISelLowering constructor.

llvm-svn: 370693
2019-09-03 02:51:10 +00:00
Craig Topper 45cd185109 [X86] Enable fp128 as a legal type with SSE1 rather than with MMX.
FP128 values are passed in xmm registers so should be asssociated
with an SSE feature rather than MMX which uses a different set
of registers.

llc enables sse1 and sse2 by default with x86_64. But does not
enable mmx. Clang enables all 3 features by default.

I've tried to add command lines to test with -sse
where possible, but any test that returns a value in an xmm
register fails with a fatal error with -sse since we have no
defined ABI for that scenario.

llvm-svn: 370682
2019-09-02 20:16:30 +00:00
Simon Pilgrim fb5661a884 [X86] getPMOVMSKB - add MVT::v64i8 handling and remove from combineBitcastvxi1. NFCI.
llvm-svn: 370670
2019-09-02 15:10:35 +00:00
Andrea Di Biagio 528f68144b [X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions.
On BtVer2 conditional SIMD stores are heavily microcoded.
The latency is directly proportional to the number of packed elements extracted
from the input vector. Also, according to micro-benchmarks, most of the
computation seems to be done in the integer unit.

Only a minority of the uOPs is executed by the FPU. The observed behaviour on
the FPU looks similar to this:
 - The input MASK value is moved to the Integer Unit
   -- [ a VMOVMSK-like uOP-executed on JFPU0].
 - In parallel, each element of the input XMM/YMM is extracted and then sent to
   the IntegerUnit through JFPU1.

As expected, a (conditional) store is executed for every extracted element.
Interestingly, a (speculative) load is executed for every extracted element too.
It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the
integer unit for every contionally stored element.
VMASKMOVDQU is a special case: the number of speculative loads is always 2
(presumably, one load per quadword). That means, extra shifts and masking is
performed on (one of) the loaded quadwords before each conditional store (that
also explains the big number of non-FP uOPs retired).

This patch replaces the existing writes for conditional SIMD stores (i.e.
WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes:

  WriteFMaskedStore32  [ XMM Packed Single ]
  WriteFMaskedStore32Y [ YMM Packed Single ]
  WriteFMaskedStore64  [ XMM Packed Double ]
  WriteFMaskedStore64Y [ YMM Packed Double ]

Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe
both RM and MR variants for conditional SIMD moves in a single tablegen
definition.
Instances of that class are then passed in input to multiclass avx_movmask_rm
when constructing MASKMOVPS/PD definitions.

Since this patch introduces new writes, I had to update all the X86 scheduling
models.

Differential Revision: https://reviews.llvm.org/D66801

llvm-svn: 370649
2019-09-02 12:32:28 +00:00
Simon Pilgrim 05a3a92751 [X86] combineHorizontalPredicateResult - pull out repeated getTargetLoweringInfo() calls. NFCI.
llvm-svn: 370637
2019-09-02 10:42:48 +00:00
Craig Topper 3ab210862a [X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load
MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point.

Differential Revision: https://reviews.llvm.org/D67017

llvm-svn: 370620
2019-09-01 22:14:36 +00:00
Simon Pilgrim 07de5292e5 [X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI.
Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed.

Cleanup the in-lane shuffle mask generation to make it more obvious what's going on.

Some prep work noticed while investigating the poor shuffle code mentioned in D66004.

llvm-svn: 370613
2019-09-01 16:04:28 +00:00
Simon Pilgrim 27cc2efaf2 Fix shadow variable warning. NFCI.
llvm-svn: 370610
2019-09-01 13:10:18 +00:00
Craig Topper 1594605416 [X86] Replace some COPY_TO_REGCLASS from GR32/GR64 to VR128 in isel patterns with VMOVDI2PDIrr/VMOV64toPQIrr.
This is what the copies will eventually be turned into. We don't
use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should
use the real instruction here too.

llvm-svn: 370601
2019-08-31 23:52:25 +00:00
Craig Topper 1329cc6e01 [X86] Compress the flag bits in the folding tables to make room for more bits in an upcoming patch.
llvm-svn: 370600
2019-08-31 23:52:21 +00:00
Simon Pilgrim f8d1d00190 [X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170)
EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars.

llvm-svn: 370592
2019-08-31 16:21:31 +00:00
Simon Pilgrim cffbec63d6 Fix shadow variable warning by making CondCodes names more explicit. NFCI.
llvm-svn: 370589
2019-08-31 15:19:59 +00:00
Simon Pilgrim ad020c0af1 Fix shadow variable warning. NFCI.
llvm-svn: 370585
2019-08-31 15:01:03 +00:00
Simon Pilgrim 2d89007f61 [X86ISelLowering] combineCMov - cleanup CMOV->LEA codegen. NFCI.
Only compute the diff once and we don't need the truncation code (assert the bitwidth is correct just to be safe).

llvm-svn: 370583
2019-08-31 14:18:26 +00:00
Simon Pilgrim 7238353da2 [X86ISelLowering] LowerSELECT - remove duplicate value type. NFCI.
VT of SELECT result and selection ops will be the same.

llvm-svn: 370581
2019-08-31 13:14:52 +00:00
Reid Kleckner 185ddc08ee Fix SEH_NoReturn machine verifier error
llvm-svn: 370543
2019-08-30 22:40:51 +00:00
Reid Kleckner 657a06c619 [MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata directives
llvm-svn: 370540
2019-08-30 22:25:55 +00:00
Reid Kleckner a33474d595 [X86] Print register names in .seh_* directives
Also improve assembler parser register validation for .seh_ directives.
This requires moving X86-specific seh directive handling into the x86
backend, which addresses some assembler FIXMEs.

Differential Revision: https://reviews.llvm.org/D66625

llvm-svn: 370533
2019-08-30 21:23:05 +00:00
Reid Kleckner 0bb1630685 [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn
Users have complained llvm.trap produce two ud2 instructions on Win64,
one for the trap, and one for unreachable. This change fixes that.

TrapUnreachable was added and enabled for Win64 in r206684 (April 2014)
to avoid poorly understood issues with the Windows unwinder.

There seem to be two major things in play:
- the unwinder
- C++ EH, _CxxFrameHandler3 & co

The unwinder disassembles forward from the return address to scan for
epilogues. Inserting a ud2 had the effect of stopping the unwinder, and
ensuring that it ran the EH personality function for the current frame.
However, it's not clear what the unwinder does when the return address
happens to be the last address of one function and the first address of
the next function.

The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out
what the current EH state number is. It does this by consulting the
ip2state table, which maps from PC to state number. This seems to go
wrong when the return address is the last PC of the function or catch
funclet.

I'm not sure precisely which system is involved here, but in order to
address these real or hypothetical problems, I believe it is enough to
insert int3 after a call site if it would otherwise be the last
instruction in a function or funclet.  I was able to reproduce some
similar problems locally by arranging for a noreturn call to appear at
the end of a catch block immediately before an unrelated function, and I
confirmed that the problems go away when an extra trailing int3
instruction is added.

MSVC inserts int3 after every noreturn function call, but I believe it's
only necessary to do it if the call would be the last instruction. This
change inserts a pseudo instruction that expands to int3 if it is in the
last basic block of a function or funclet. I did what I could to run the
Microsoft compiler EH tests, and the ones I was able to run showed no
behavior difference before or after this change.

Differential Revision: https://reviews.llvm.org/D66980

llvm-svn: 370525
2019-08-30 20:46:39 +00:00
Craig Topper 18e8d02e8c [X86] Pass v32i16/v64i8 in zmm registers on KNL target.
gcc and icc pass these types in zmm registers in zmm registers.

This patch implements a quick hack to override the register
type before calling convention handling to one that is legal.
Longer term we might want to do something similar to 256-bit
integer registers on AVX1 where we just split all the operations.

Fixes PR42957

Differential Revision: https://reviews.llvm.org/D66708

llvm-svn: 370495
2019-08-30 17:35:08 +00:00
Craig Topper 66f03ba17d [X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site.
I'm looking at unfolding broadcast loads on AVX512 which will
require refactoring this code to select broadcast opcodes instead
of regular load/stores in some cases. Merging them to avoid
further complicating their interfaces.

llvm-svn: 370484
2019-08-30 16:05:57 +00:00
Craig Topper 160ed4cab4 [X86] Explicitly list all the always trivially rematerializable instructions.
Add a default with an llvm_unreachable for anything we don't expect.

This seems safer that just blindly returning true for anything
missing from the switch.

llvm-svn: 370424
2019-08-30 00:54:36 +00:00
Reid Kleckner 5b79e603d3 [X86] Don't emit unreachable stack adjustments
Summary:
This is a minor improvement on our past attempts to do this. Fixes
PR43155.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66905

llvm-svn: 370409
2019-08-29 21:24:41 +00:00
Reid Kleckner 81e458d001 Allow '@' to appear in x86 mingw symbols
Summary:
There is no reason to differ in assembler behavior here between -msvc
and -gnu targets. Without this setting, the text after the '@' is
interpreted as a symbol variable, like foo@IMGREL.

Reviewers: mstorsjo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66974

llvm-svn: 370408
2019-08-29 21:15:02 +00:00
Simon Pilgrim 3d705a1fa4 [X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159)
ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element.

llvm-svn: 370404
2019-08-29 20:22:08 +00:00
Craig Topper 5a43fdd313 [X86] Remove what little support we had for MPX
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define

I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.

gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX

Differential Revision: https://reviews.llvm.org/D66669

llvm-svn: 370393
2019-08-29 18:09:02 +00:00
Roman Lebedev cc7495a355 [X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to X86DAGToDAGISel
Summary:
We were previously doing it in DAGCombine.
But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine.
So if we had `sub %x, -1`, we'll transform it to `add %x, 1`,
which `combineIncDecVector()` will immediately transform back into `sub %x, -1`,
and here we go again...

I've marked this as NFC since not a single test changes,
but since that 'changes' DAGCombine, probably this isn't fully NFC.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62327

llvm-svn: 370327
2019-08-29 10:50:09 +00:00
Craig Topper c96284002e [X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load.
The DAG should have these as X86VBroadcast+load.

llvm-svn: 370299
2019-08-29 06:36:16 +00:00
Craig Topper 231e628d69 [X86] Remove some unneeded X86VBroadcast isel patterns that have larger than 128 bit input types.
We should always be shrinking the input to 128 bits or smaller
when the node is created.

llvm-svn: 370296
2019-08-29 06:02:11 +00:00
Craig Topper 1ec5c204b8 [X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. Remove corresponding isel patterns.
We had an isel pattern to perform this, but its better to
do it in DAG combine as a simplification. This also fixes the lack
of patterns for AVX512 targets.

llvm-svn: 370294
2019-08-29 05:48:48 +00:00
Craig Topper 1aadf6f39f [X86] Make inline assembly 'x' and 'v' constraints work for f128.
Including a type legalizer fix to make bitcast operand promotion
work correctly when getSoftenedFloat returns f128 instead of i128.

Fixes PR43157

llvm-svn: 370293
2019-08-29 05:13:56 +00:00
Craig Topper af364131af [X86] Fix a couple isel patterns to not shrink a volatile load.
Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine.

And another FIXME because the AVX512 equivalent one of the patterns is missing.

llvm-svn: 370276
2019-08-28 23:45:10 +00:00
Hans Wennborg cff90f07cb [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)
Neither libgcc or compiler-rt are usually used on Windows, so these
functions can't be called.

Differential revision: https://reviews.llvm.org/D66880

llvm-svn: 370204
2019-08-28 13:55:10 +00:00
Vlad Tsyrklevich 57076d3199 Revert "Change the X86 datalayout to add three address spaces for 32 bit signed,"
This reverts commit r370083 because it caused check-lld failures on
sanitizer-x86_64-linux-fast.

llvm-svn: 370142
2019-08-28 01:08:54 +00:00
Amy Huang 1299945b81 Change the X86 datalayout to add three address spaces for 32 bit signed,
32 bit unsigned, and 64 bit pointers.

llvm-svn: 370083
2019-08-27 17:46:53 +00:00
Craig Topper fc1f08c2f2 [X86] Remove encoding information from the TAILJMP instructions that are lowered by MCInstLowering. Fix LowerPATCHABLE_TAIL_CALL to also convert them to regular JMP/JCC instructions
There are 5 instructions here that are converted from TAILJMP opcodes to regular JMP/JCC opcodes during MCInstLowering. So normally there encoding information isn't used. The exception being when XRay wraps them in PATCHABLE_TAIL_CALL.

For the ones that weren't already handled in MCInstLowering, add handling for those and remove their encoding information.

This patch fixes PATCHABLE_TAIL_CALL to do the same opcode conversion as the regular lowering patch. Then removes the encoding information.

Differential Revision: https://reviews.llvm.org/D66561

llvm-svn: 370079
2019-08-27 17:24:23 +00:00
Simon Pilgrim 8912e2af39 [X86][AVX] Add SimplifyDemandedVectorElts support for KSHIFTL/KSHIFTR
Differential Revision: https://reviews.llvm.org/D66527

llvm-svn: 370055
2019-08-27 13:13:17 +00:00
Pengfei Wang 564fb58a32 [WinEH] Allocate space in funclets stack to save XMM CSRs
Summary:
This is an alternate approach to D63396

Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.

This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.

Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>

Reviewers: rnk, RKSimon, craig.topper, annita.zhang, LuoYuanke, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66596

Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>
llvm-svn: 370005
2019-08-27 01:53:24 +00:00
Craig Topper 6db7f492d9 [X86] Delay combineIncDecVector until after op legalization.
Probably better to keep add over sub in early DAG combines.

It might make sense to push this to lowering or delay it all
the way to isel. But this was the simplest change.

llvm-svn: 369981
2019-08-26 22:17:54 +00:00
Craig Topper 36d1588f01 [X86] Add a hack to combinePMULDQ to manually turn SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG inputs into an ANY_EXTEND_VECTOR_INREG style shuffle
ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1.

This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit.

Differential Revision: https://reviews.llvm.org/D66436

llvm-svn: 369942
2019-08-26 18:23:26 +00:00
Craig Topper b8b90ac1c5 [X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef
Summary:
Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not.

I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66504

llvm-svn: 369872
2019-08-25 17:59:49 +00:00
Craig Topper 1abe162a9a [X86] Teach -Os immediate sharing code to not count constant uses that will become INC/DEC.
INC/DEC don't use an immediate so we don't need to count it. We
also shouldn't use the custom isel for it.

Fixes PR42998.

llvm-svn: 369863
2019-08-25 05:22:40 +00:00
Craig Topper cc4b0596b1 [X86] Add isel patterns to match vpdpwssd avx512vnni instruction from add+pmaddwd nodes.
llvm-svn: 369859
2019-08-24 23:14:57 +00:00
Craig Topper dd2cf78381 [X86] Add an assert to mark more code that needs to be removed when the vector widening legalization switch is removed again.
llvm-svn: 369837
2019-08-24 05:59:46 +00:00
Benjamin Kramer dc5f805d31 Do a sweep of symbol internalization. NFC.
llvm-svn: 369803
2019-08-23 19:59:23 +00:00
Craig Topper bc173d4c51 [X86] Move a transform out of combineConcatVectorOps so we don't prematurely turn CONCAT_VECTORS into INSERT_SUBVECTORS.
CONCAT_VECTORS and INSERT_SUBVECTORS can both call combineConcatVectorOps,
but we shouldn't produce INSERT_SUBVECTORS from there. We should
keep CONCAT_VECTORS until vector legalization.

Noticed while looking at the madd_quad_reduction test from madd.ll

llvm-svn: 369802
2019-08-23 19:52:24 +00:00
Craig Topper bccd183217 [X86] Mark VPDPWSSD and VPDPWSSDS as commutable. Add stack folding tests.
llvm-svn: 369792
2019-08-23 18:05:37 +00:00
Craig Topper e7211bb567 [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression
Patch showing the effect of enabling bool vector oversimplification.

Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify:

insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i

Preventing the removal of the AND to clear the upper bits of result

Differential Revision: https://reviews.llvm.org/D53022

llvm-svn: 369780
2019-08-23 17:14:58 +00:00
Simon Pilgrim c88408cf85 Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI.
llvm-svn: 369751
2019-08-23 12:37:09 +00:00
Andrea Di Biagio 8e9af64da6 [X86][BtVer2] Add a read-advance to every implicit register use of CMPXCHG8B/16B.
This is a follow up of r369642.

This patch assigns a ReadAfterLd to every implicit register use of instruction
CMPXCHG8B and instruction CMPXCHG16B. Perf micro-benchmarks show that implicit
registers are read after 3cy from the start of execution.

llvm-svn: 369750
2019-08-23 12:19:45 +00:00
Andrea Di Biagio 1630f64e2f [X86][BtVer2] Fix latency of ALU RMW instructions.
Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed
latency of all other RMW integer arithmetic/logic instructions is 6cy and not
5cy.

Example (ADD):

```
addb $0, (%rsp)            # Latency: 6cy
addb $7, (%rsp)            # Latency: 6cy
addb %sil, (%rsp)          # Latency: 6cy

addw $0, (%rsp)            # Latency: 6cy
addw $511, (%rsp)          # Latency: 6cy
addw %si, (%rsp)           # Latency: 6cy

addl $0, (%rsp)            # Latency: 6cy
addl $511, (%rsp)          # Latency: 6cy
addl %esi, (%rsp)          # Latency: 6cy

addq $0, (%rsp)            # Latency: 6cy
addq $511, (%rsp)          # Latency: 6cy
addq %rsi, (%rsp)          # Latency: 6cy
```

The same latency profile applies to SUB/AND/OR/XOR/INC/DEC.

The observed latency of ADC/SBB is 7-8cy. So we need a different write to model
those.  Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower
than what the model for btver2 currently reports).

Differential Revision: https://reviews.llvm.org/D66636

llvm-svn: 369748
2019-08-23 11:34:10 +00:00
Craig Topper 4deb388bca [X86] Make combineLoopSADPattern use CONCAT_VECTORS instead of INSERT_SUBVECTORS for widening with zeros.
CONCAT_VECTORS is more canonical for the early DAG combine runs
until we start getting into the op legalization phases.

llvm-svn: 369734
2019-08-23 06:08:33 +00:00
Craig Topper bdceb9fb14 [X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern.
For v2i32 we only feed 2 i8 elements into the psadbw instructions
with 0s in the other 14 bytes. The resulting psadbw instruction
will produce zeros in bits [127:16] of the output. We need to take
the result and feed it to a v2i32 add where the first element
includes bits [15:0] of the sad result. The other element should
be zero.

Prior to this patch we were using a truncate to take 0 from
bits 95:64 of the psadbw. This results in a pshufd to move those
bits to 63:32. But since we also have zeroes in bits 63:32 of
the psadbw output, we should just take those bits.

The previous code probably worked better with promoting legalization,
but now we use widening legalization. I've preserved the old
behavior if -x86-experimental-vector-widening-legalization=false
until we get that option removed.

llvm-svn: 369733
2019-08-23 05:33:27 +00:00
Sam Clegg 90b6bb75e8 [MC] Minor cleanup to MCFixup::Kind handling. NFC.
Prefer `MCFixupKind` where possible and add getTargetKind() to
convert to `unsigned` when needed rather than scattering cast
operators around the place.

Differential Revision: https://reviews.llvm.org/D59890

llvm-svn: 369720
2019-08-23 01:00:55 +00:00
Francis Visoiu Mistrih 5b5ee61b5f [MachO][TLOF] Use hasLocalLinkage to determine if indirect symbol is local
Local symbols in the indirect symbol table contain the value
`INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must
contain the address of the target.

In r349060, I added support for local symbols in the indirect symbol
table, which was checking if the symbol `isDefined` && `!isExternal` to
determine if the symbol is local or not.

It turns out that `isDefined` will return false if the user of the
symbol comes before its definition, and we'll again generate .long 0
which will be the symbol at the adress 0x0.

Instead of doing that, use GlobalValue::hasLocalLinkage() to check if
the symbol is local.

Differential Revision: https://reviews.llvm.org/D66563

llvm-svn: 369671
2019-08-22 16:59:00 +00:00
Craig Topper 898a0e9b84 [X86] Remove MCInstLower code that drops operands from some CALL and TAILJMP instructions. Add asserts to verify operand count
It appears the FIXME here was handled at some point. r159728 from 2012 seems to be at least aportion of fixing it.

Differential Revision: https://reviews.llvm.org/D66570

llvm-svn: 369665
2019-08-22 16:23:35 +00:00
Andrea Di Biagio c9649eb9da [X86][BtVer2] Fix latency/throughput of scalar integer MUL instructions.
Single operand MUL instructions that implicitly set EAX have the following
latency/throughput profile (see below):

imul %cl              # latency: 3cy - uOPs: 1 - 1 JMul
imul %cx              # latency: 3cy - uOPs: 3 - 3 JMul
imul %ecx             # latency: 3cy - uOPs: 2 - 2 JMul
imul %rcx             # latency: 6cy - uOPs: 2 - 4 JMul

mul %cl               # latency: 3cy - uOPs: 1 - 1 JMul
mul %cx               # latency: 3cy - uOPs: 3 - 3 JMul
mul %ecx              # latency: 3cy - uOPs: 2 - 2 JMul
mul %rcx              # latency: 6cy - uOPs: 2 - 4 JMul

Excluding the 64bit variant, which has a latency of 6cy, every other instruction
has a latency of 3cy. However, the number of decoded macro-opcodes (as well as
the resource cyles) depend on the MUL size.

The two operand MULs have a more predictable profile (see below):

imul %dx, %dx         # latency: 3cy - uOPs: 1 - 1 JMul
imul %edx, %edx       # latency: 3cy - uOPs: 1 - 1 JMul
imul %rdx, %rdx       # latency: 6cy - uOPs: 1 - 4 JMul

imul $3, %dx, %dx     # latency: 4cy - uOPs: 2 - 2 JMul
imul $3, %ecx, %ecx   # latency: 3cy - uOPs: 1 - 1 JMul
imul $3, %rdx, %rdx   # latency: 6cy - uOPs: 1 - 4 JMul

This patch updates the values in the Jaguar scheduling model and regenerates
llvm-mca tests.

Differential Revision: https://reviews.llvm.org/D66547

llvm-svn: 369661
2019-08-22 15:20:16 +00:00
Andrea Di Biagio c6744055ad [X86][BtVer2] Fix latency and throughput of XCHG and XADD.
On Jaguar, XCHG has a latency of 1cy and decodes to 2 macro-opcodes. Maximum
throughput for XCHG is 1 IPC. The byte exchange has worse latency and decodes to
1 extra uOP; maximum observed throughput is 0.5 IPC.

```
xchgb %cl, %dl           # Latency: 2cy  -  uOPs: 3  -  2 ALU
xchgw %cx, %dx           # Latency: 1cy  -  uOPs: 2  -  2 ALU
xchgl %ecx, %edx         # Latency: 1cy  -  uOPs: 2  -  2 ALU
xchgq %rcx, %rdx         # Latency: 1cy  -  uOPs: 2  -  2 ALU
```

The reg-mem forms of XCHG are atomic operations with an observed latency of
16cy.  The resource usage is similar to the XCHGrr variants. The biggest
difference is obviously the bus-locking, which prevents the LS to issue other
memory uOPs in parallel until the unlocking store uOP is executed.

```
xchgb %cl, (%rsp)        # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
xchgw %cx, (%rsp)        # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
xchgl %ecx, (%rsp)       # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
xchgq %rcx, (%rsp)       # Latency: 16cy  -  uOPs: 3 - ECX latency: 11cy
```

The exchanged in/out register operand becomes available after 11cy from the
start of execution. Added test xchg.s to verify that we correctly see that
register write committed in 11cy (and not 16cy).

Reg-reg XADD instructions have the same latency/throughput than the byte
exchange (register-register variant).

```
xaddb %cl, %dl           # latency: 2cy  -  uOPs: 3  -  3 ALU
xaddw %cx, %dx           # latency: 2cy  -  uOPs: 3  -  3 ALU
xaddl %ecx, %edx         # latency: 2cy  -  uOPs: 3  -  3 ALU
xaddq %rcx, %rdx         # latency: 2cy  -  uOPs: 3  -  3 ALU
```

The non-atomic RM variants have a latency of 11cy, and decode to 4
macro-opcodes. They still consume 2 ALU pipes, and the exchange in/out register
operand becomes available in 3cy (it matches the 'load-to-use latency').

```
xaddb %cl, (%rsp)        # latency: 11cy  -  uOPs: 4  -  3 ALU
xaddw %cx, (%rsp)        # latency: 11cy  -  uOPs: 4  -  3 ALU
xaddl %ecx, (%rsp)       # latency: 11cy  -  uOPs: 4  -  3 ALU
xaddq %rcx, (%rsp)       # latency: 11cy  -  uOPs: 4  -  3 ALU
```

The atomic XADD variants execute in 16cy. The in/out register operand is
available after 11cy from the start of execution.

```
lock xaddb %cl, (%rsp)   # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddw %cx, (%rsp)   # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddl %ecx, (%rsp)  # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
lock xaddq %rcx, (%rsp)  # latency: 16cy - uOPs: 4 - 3 ALU -- ECX latency: 11cy
```

Added test xadd.s to verify those latencies as well as read-advance values.

Differential Revision: https://reviews.llvm.org/D66535

llvm-svn: 369642
2019-08-22 11:32:47 +00:00
Simon Pilgrim 6dd51c2f19 [MVT] Add MVT equivalent to EVT::getHalfNumVectorElementsVT() helper. NFCI.
Allows for some cleanup in a lot of SSE/AVX vector splitting code

llvm-svn: 369640
2019-08-22 11:14:30 +00:00
Craig Topper d420616313 [X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening legalization.
I don't really understand the costs we're using for fp_to_sint,
but prior to widening legalization we used 20 as the cost for this
via the v2i64->v2f64 entry. That number seems better than the 40
we got with widening legalization. So now we need either a
v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether
AVX is enabled or not since we skip the first SSE2 table look up
under AVX.

llvm-svn: 369628
2019-08-22 08:18:45 +00:00
Pengfei Wang 7630e24492 [X86] Making X86OptimizeLEAs pass public. NFC
Reviewers: wxiao3, LuoYuanke, andrew.w.kaylor, craig.topper, annita.zhang, liutianle, pengfei, xiangzhangllvm, RKSimon, spatel, andreadb

Reviewed By: RKSimon

Subscribers: andreadb, hiraditya, llvm-commits

Tags: #llvm

Patch by Gen Pei (gpei)

Differential Revision: https://reviews.llvm.org/D65933

llvm-svn: 369612
2019-08-22 02:29:27 +00:00
Craig Topper 78e6507b0a [X86] Correct the scheduler classes for TAILJMP and TCRETURN CodeGenOnly instructions.
We had an odd combination of WriteJump applied to some memory
instructions and WriteJumpLd applied to register and immediate
instructions.

Thsi should hopefully assign them all correctly.

llvm-svn: 369599
2019-08-21 23:17:52 +00:00
Craig Topper 303bbc3be2 [X86] Replace a couple hardcoded '5's with X86::AddrNumOperands for readability. NFC
llvm-svn: 369598
2019-08-21 22:40:07 +00:00
Amara Emerson 56606a4db3 [AArch64][GlobalISel] Add support for narrowScalar of G_ZEXT
We do this by merging the source with the high bits set to 0.

Differential Revision: https://reviews.llvm.org/D66181

llvm-svn: 369480
2019-08-21 00:12:37 +00:00
Craig Topper ba375263e8 [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Differential Revision: https://reviews.llvm.org/D66456

llvm-svn: 369459
2019-08-20 22:12:50 +00:00
Reid Kleckner 22fb734907 Revert [WinEH] Allocate space in funclets stack to save XMM CSRs
This reverts r367088 (git commit 9ad565f70e)

And the follow up fix r368631 / e9865b9b31

llvm-svn: 369457
2019-08-20 22:08:57 +00:00
Craig Topper 3a2b08e6c9 [X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target
Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))).

Differential Revision: https://reviews.llvm.org/D66489

llvm-svn: 369434
2019-08-20 20:20:04 +00:00
Craig Topper 250951abf5 [X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend.
We already had patterns for extending to i32 to take advantage of
the impliciting zeroing of the upper bits of a 32-bit GPR that is
done by KMOVW/KMOVB. But the extend might be all the way to i64,
in which case the existing patterns would fail and we'd get a
KMOVW/B followed by a MOVZX. By adding patterns for i64 we can
use the fact that KMOVW/B zero the upper bits of the 32-bit GPR
and the normal property that 32-bit GPR writes implicitly zero the
upper 32-bits of the full 64-bit GPR.

The anyextend patterns are slightly different since we don't care
about the upper zeros. For the i8->i64 I think this avoids selecting
the anyextend as a MOVZX to prevent a partial register issue that
doesn't exist. For i16->i64 I think we would have just emitted an
insert_subreg on top of the extract_subreg that the vXi16->i16
bitcast pattern emits. The register coalescer or peephole pass
should combine those, but this saves that work and makes i8/16
consistent.

llvm-svn: 369431
2019-08-20 19:43:48 +00:00
Martin Storsjo 514f3a122d [TargetMachine] Don't try to create COFFSTUB references on windows on non-COFF
This avoids spurious relocation types for windows/elf targets.

Differential Revision: https://reviews.llvm.org/D66401

llvm-svn: 369426
2019-08-20 18:58:05 +00:00
Andrea Di Biagio 2e897a94f5 [X86][BtVer2] Use ReadAfterLd entries for the register operands of CMPXCHG.
This is a follow-up of r369365.

llvm-svn: 369412
2019-08-20 17:05:56 +00:00
Craig Topper 22ac9f396f [X86] Use isNullConstant instead of getConstantOperandVal == 0. NFC
llvm-svn: 369410
2019-08-20 16:55:12 +00:00
Andrea Di Biagio 16111d3795 [X86][BtVer2] Fix latency and throughput of atomic INC/DEC/NEG/NOT.
Latency and throughput of LOCK INC/DEC/NEG/NOT is always 19cy.
Number of uOPs is still 1.

Differential Revision: https://reviews.llvm.org/D66469

llvm-svn: 369388
2019-08-20 14:31:27 +00:00
Andrea Di Biagio b1bdd97a26 [X86][Btver2] Fix latency and throughput of CMPXCHG instructions.
On Jaguar, CMPXCHG has a latency of 11cy, and a maximum throughput of 0.33 IPC.
Throughput is superiorly limited to 0.33 because of the implicit in/out
dependency on register EAX. In the case of repeated non-atomic CMPXCHG with the
same memory location, store-to-load forwarding occurs and values for sequent
loads are quickly forwarded from the store buffer.

Interestingly, the functionality in LLVM that computes the reciprocal throughput
doesn't seem to know about RMW instructions. That functionality only looks at
the "consumed resource cycles" for the throughput computation. It should be
fixed/improved by a future patch. In particular, for RMW instructions, that
logic should also take into account for the write latency of in/out register
operands.

An atomic CMPXCHG has a latency of ~17cy. Throughput is also limited to
~17cy/inst due to cache locking, which prevents other memory uOPs to start
executing before the "lock releasing" store uOP.

CMPXCHG8rr and CMPXCHG8rm are treated specially because they decode to one less
macro opcode. Their latency tend to be the same as the other RR/RM variants. RR
variants are relatively fast 3cy (but still microcoded - 5 macro opcodes).

CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load
forwarding. That means, throughput is clearly limited by the in/out dependency
on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs
for the Integer pipes). I have reused the same mix of consumed resource from the
other CMPXCHG instructions for CMPXCHG8B too.
LOCK CMPXCHG8B is instead 18cycles.

CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to
the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38)
cycles dependeing on whether the LOCK prefix is specified or not.
I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x
microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to
2x the resource cycles used for CMPXCHG8B.

The two new hasLockPrefix() functions are used by the btver2 scheduling model
check if a MCInst/MachineInst has a LOCK prefix. Calls to hasLockPrefix() have
been encoded in predicates of variant scheduling classes that describe lat/thr
of CMPXCHG.

Differential Revision: https://reviews.llvm.org/D66424

llvm-svn: 369365
2019-08-20 10:23:55 +00:00
Craig Topper 1ada137854 [X86] Add back the -x86-experimental-vector-widening-legalization comand line flag and all associated code, but leave it enabled by default
Google is reporting performance issues with the new default behavior
and have asked for a way to switch back to the old behavior while we
investigate and make fixes.

I've restored all of the code that had since been removed and added
additional checks of the command flag onto code paths that are
not otherwise guarded by a check of getTypeAction.

I've also modified the cost model tables to hopefully get us back
to the previous costs.

Hopefully we won't need to support this for very long since we
have no test coverage of the old behavior so we can very easily
break it.

llvm-svn: 369332
2019-08-20 06:58:00 +00:00
Craig Topper a0d92c7262 [X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle.
The motivating case are the changes in vector-reduce-add.ll where
we were doing extra work in the scalar domain instead of shuffling.
There may be some one use check that needs to be looked into there,
but this patch sidesteps the issue by avoiding broadcasts that
aren't really broadcasting.

Differential Revision: https://reviews.llvm.org/D66071

llvm-svn: 369287
2019-08-19 18:15:50 +00:00
Craig Topper ebb7ddc633 [X86] Teach lower1BitShuffle to match right shifts with upper zero elements on types that don't natively support KSHIFT.
We can support these by widening to a supported type,
then shifting all the way to the left and then
back to the right to ensure that we shift in zeroes.

llvm-svn: 369232
2019-08-19 05:45:39 +00:00
Craig Topper e47437a6ef [X86] Fix the lower1BitShuffle code added in r369215 to correctly pass the widened vector to the KSHIFT node.
Not sure how to test this as we have tests that exercise this code,
but nothing failed for the types not matching. Since all the k-registers
use equivalent register classes everything just ends up working.

llvm-svn: 369228
2019-08-19 04:08:44 +00:00
Craig Topper 269c6b1c15 [X86] Teach lower1BitShuffle to match KSHIFTR that doesn't use Zeroable and only relies on undef.
This allows us to widen the type when the KSHIFTR instruction
doesn't exist for the type. If we need to shift in zeroes into
the upper elements we would need more work to guarantee zeroes
when widening.

llvm-svn: 369227
2019-08-19 04:08:40 +00:00
Craig Topper 2eb7951da3 [X86] Teach lower1BitShuffle to recognize padding a subvector with zeros with V2 as the source and V1 as the zero vector.
Shuffle canonicalization can swap the sources so the zero vector
might be V1 and the subvector that's being padded can be V2.

llvm-svn: 369226
2019-08-19 00:39:22 +00:00
Craig Topper 2ee46c7c4b [X86] Add a special case to LowerCONCAT_VECTORSvXi1 to handle concatenating zero vectors followed by one non-zero vector followed by undef vectors.
For such a case we should only need a KSHIFTL, but we were
previously generating a KSHIFTL followed by a KSHIFTR because
we mistakenly believed we need to zero the undef elements.

llvm-svn: 369224
2019-08-18 23:30:11 +00:00
Craig Topper 388b8dd94a [X86] Replace uses of getZeroVector for vXi1 vectors with DAG.getConstant.
vXi1 vectors don't need special handling.

llvm-svn: 369222
2019-08-18 23:30:03 +00:00
Craig Topper 9e074c06fe [X86] Improve lower1BitShuffle handling for KSHIFTL on narrow vectors.
We can insert the value into a larger legal type and shift that
by the desired amount.

llvm-svn: 369215
2019-08-18 18:52:46 +00:00
Simon Pilgrim 63b3c56fca Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 369213
2019-08-18 17:26:30 +00:00
Simon Pilgrim fee2546f3f [X86] isTargetShuffleEquivalent - add BUILD_VECTOR matching
Add similar functionality to isShuffleEquivalent - if the mask elements don't match, try matching the BUILD_VECTOR scalars instead.

As target shuffles need to handle SM_Sentinel values, this can get a bit tricky, so commit just adds actual mask element index handling - full SM_SentinelZero support will be added when the need arises.

Also, enables support in matchVectorShuffleWithPACK

llvm-svn: 369212
2019-08-18 17:15:26 +00:00
Simon Pilgrim a66edd86e2 [X86] isTargetShuffleEquivalent - early out on illegal shuffle masks. NFCI.
Simplifies shuffle mask comparisons by just bailing out if the shuffle mask has any out of range values - will make an upcoming patch much simpler.

llvm-svn: 369211
2019-08-18 16:37:58 +00:00
Craig Topper 31f829f0cd [X86] Add a one use check to the combineStore code that handles v16i16->v16i8 truncate+store by extending to v16i32 and then emitting a v16i32->v16i8 truncstore.
This prevent us from emitting a separate truncate and a truncating
store instruction.

llvm-svn: 369200
2019-08-17 22:46:15 +00:00
Jordan Rupprecht d0797ece46 Revert [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied)
This reverts r368662 (git commit 1a8d790cf5)

The compile-time regression repro is in https://bugs.llvm.org/show_bug.cgi?id=43024

llvm-svn: 369167
2019-08-16 23:08:56 +00:00
Craig Topper a17d1d2250 [X86] Use Register/MCRegister in more places in X86
This was a quick pass through some obvious places. I haven't tried the clang-tidy check.

I also replaced the zeroes in getX86SubSuperRegister with X86::NoRegister which is the real sentinel name.

Differential Revision: https://reviews.llvm.org/D66363

llvm-svn: 369151
2019-08-16 20:50:23 +00:00
Simon Pilgrim 63b78b678b [X86] resolveTargetShuffleInputs - add DemandedElts variant. NFCI.
Nothing calls this yet, everything still goes through the non (all) DemandedElts wrapper.

llvm-svn: 369136
2019-08-16 18:13:22 +00:00
Simon Pilgrim 8ff1b7de4d [X86] combineExtractWithShuffle - handle extract(truncate(x), 0)
Eventually we need to generalize combineExtractWithShuffle to handle all faux shuffles and handle truncate (and X86ISD::VTRUNC etc.) there, but we're not ready yet (still creates nodes on the fly, incomplete DemandedElts support, bad use of recursive Depth limit).

llvm-svn: 369134
2019-08-16 17:35:08 +00:00
Simon Pilgrim 3a8c698771 [X86] Alphabetize pass initialization definitions. NFCI.
llvm-svn: 369126
2019-08-16 16:41:38 +00:00
Simon Pilgrim 9da4989c52 [X86] Remove unused include. NFCI.
We don't use anything from TargetOptions.h directly and its included via TargetLowering.h anyhow.

llvm-svn: 369110
2019-08-16 14:05:46 +00:00
Craig Topper 120cffccf8 [X86] Manually reimplement getTargetInsertSubreg in X86DAGToDAGISel::matchBitExtract so we can call insertDAGNode on the target constant.
This is needed to maintain the topological sort order.

Fixes PR42992.

llvm-svn: 369084
2019-08-16 04:47:44 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Craig Topper 2a372ba534 [X86] Add custom type legalization for bitcasting mmx to v2i32/v4i16/v8i8 to use movq2dq instead of going through memory.
llvm-svn: 369031
2019-08-15 18:23:37 +00:00
Craig Topper 6eebd2bcd7 [X86] Improve cost model for subvector extraction of less than 128-bit vectors
Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements.

Differential Revision: https://reviews.llvm.org/D65892

llvm-svn: 369022
2019-08-15 17:29:42 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Sanjay Patel 57d459309d [SDAG][x86] check for relaxed math when matching an FP reduction
If the last step in an FP add reduction allows reassociation and doesn't care
about -0.0, then we are free to recognize that computation as a reduction
that may reorder the intermediate steps.

This is requested directly by PR42705:
https://bugs.llvm.org/show_bug.cgi?id=42705
and solves PR42947 (if horizontal math instructions are actually faster than
the alternative):
https://bugs.llvm.org/show_bug.cgi?id=42947

Differential Revision: https://reviews.llvm.org/D66236

llvm-svn: 368995
2019-08-15 12:43:15 +00:00
Craig Topper 1e246b20c0 [X86] Add isel pattern to match VZEXT_MOVL and a v2i64 scalar_to_vector bitcasted from x86mmx to MOVQ2DQ.
We already had the pattern for just the scalar to vector and bitcast,
but not the case where we wanted zeroes in the high half of the xmm.

llvm-svn: 368972
2019-08-15 06:46:30 +00:00
Craig Topper e6409602a1 [X86] Make sure load is non-volatile in the MMX_X86movdq2q (loadv2i64) isel pattern.
This pattern will narrow the load so we should make sure its not
volatile.

llvm-svn: 368971
2019-08-15 06:46:26 +00:00
Craig Topper dbcbbf5658 [X86] Remove unneeded isel pattern for v4f32->v4i32 fp_to_sint and conversion to MMX.
fp_to_sint is turned into X86cvttp2si during isel preprocessing.
The other redundant isel patterns were removed previously, but I
missed this one because its in the MMX td file.

llvm-svn: 368968
2019-08-15 05:52:02 +00:00
Craig Topper a57734ba4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->i64.
The default legalization can take care of this.

llvm-svn: 368967
2019-08-15 05:51:58 +00:00
Craig Topper 57286afe4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->f64 bitcast.
The generic legalization handles this in the same way so just use
that.

llvm-svn: 368966
2019-08-15 05:51:54 +00:00
Craig Topper ba39fcd8c6 [X86] Remove some unreachable code from LowerBITCAST.
llvm-svn: 368965
2019-08-15 05:51:50 +00:00
Craig Topper 14f7560020 [X86] Remove some dead code and combine some repeated code that's left.
If the width is 256 bits, then we must have AVX so the else here
was unnecessary. Once that's removed then the >= 256 bit code is
identical to the 128 bit code with a different VT so combine them.

llvm-svn: 368956
2019-08-15 04:07:43 +00:00
Craig Topper 3e44d96170 [X86] Use PSADBW for v8i8 addition reductions.
Improves the 8 byte case from PR42674.

Differential Revision: https://reviews.llvm.org/D66069

llvm-svn: 368864
2019-08-14 15:57:29 +00:00
Craig Topper 30d3e9c395 [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than 128-bit inputs
Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG.

For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high.

Differential Revision: https://reviews.llvm.org/D66169

llvm-svn: 368858
2019-08-14 14:52:39 +00:00
Craig Topper 8c545168ee [X86] Add llvm_unreachable to a switch that covers all expected values.
llvm-svn: 368857
2019-08-14 14:51:19 +00:00
Simon Pilgrim e7b350a5d1 [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scaling
If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through.

Fixes the regression mentioned in rL368662

Reapplying this as rL368308 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368663
2019-08-13 11:11:42 +00:00
Simon Pilgrim 1a8d790cf5 [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied)
If we don't demand all elements, then attempt to combine to a simpler shuffle.

At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. 

The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit.

Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368662
2019-08-13 10:51:39 +00:00
Hans Wennborg 5390d25f2b Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368660
2019-08-13 09:33:25 +00:00
Amara Emerson e14c91b71a [GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained.
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.

AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.

Differential Revision: https://reviews.llvm.org/D65984

llvm-svn: 368652
2019-08-13 06:26:59 +00:00
Reid Kleckner e9865b9b31 [WinEH] Fix catch block parent frame pointer offset
r367088 made it so that funclets store XMM registers into their local
frame instead of storing them to the parent frame. However, that change
forgot to update the parent frame pointer offset for catch blocks. This
change does that.

Fixes crashes when an exception is rethrown in a catch block that saves
XMMs, as described in https://crbug.com/992860.

llvm-svn: 368631
2019-08-12 23:02:00 +00:00
Craig Topper e07e593782 [X86] Allow combineTruncateWithSat to use pack instructions for i16->i8 without AVX512BW.
We need AVX512BW to be able to truncate an i16 vector. If we don't
have that we have to extend i16->i32, then trunc, i32->i8. But we
won't be able to remove the min/max if we do that. At least not
without more special handling.

llvm-svn: 368623
2019-08-12 22:18:23 +00:00
Craig Topper 0761a38e8a [X86] Remove unreachable code from LowerTRUNCATE. NFC
All three 256->128 bit cases were already handled above.

Noticed while looking at the coverage report.

llvm-svn: 368609
2019-08-12 19:26:45 +00:00
Craig Topper a3605baaff [X86] Add a paranoia type check to the code that detects AVG patterns from truncating stores.
If we're after type legalize, we should make sure we won't create
a store with an illegal type when we separate the AVG pattern
from the truncating store.

I don't know of a way to fail for this today. Just noticed while
I was in the vicinity.

llvm-svn: 368608
2019-08-12 19:26:37 +00:00
Craig Topper 1b02909847 [X86] Simplify creation of saturating truncating stores.
We just need to check if the truncating store is legal
instead of going through isSATValidOnAVX512Subtarget.

llvm-svn: 368607
2019-08-12 19:26:30 +00:00
Craig Topper 3f4e9b156d [X86] Replace call to isTruncStoreLegalOrCustom with isTruncStoreLegal. NFC
We have no custom trunc stores on X86.

llvm-svn: 368606
2019-08-12 19:26:22 +00:00
Craig Topper 09d5d15339 [X86] Disable use of zmm registers for varargs musttail calls under prefer-vector-width=256 and min-legal-vector-width=256.
Under this config, the v16f32 type we try to use isn't to a register
class so the getRegClassFor call will fail.

llvm-svn: 368594
2019-08-12 17:43:26 +00:00
Simon Pilgrim 182249daee [X86][SSE] ComputeKnownBits - add basic PSADBW handling
llvm-svn: 368558
2019-08-12 12:19:19 +00:00
Pengfei Wang e28cbbd5d4 [X86] Support -march=tigerlake
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D65840

llvm-svn: 368543
2019-08-12 01:29:46 +00:00
Craig Topper ce6a2cf966 [X86] Simplify some of the type checks in combineSubToSubus.
If we have SSE2 we can handle any i8/i16 type and let
type legalization deal with it.

llvm-svn: 368538
2019-08-11 17:36:49 +00:00
Craig Topper 637964bfd8 [X86] Don't use SplitOpsAndApply for ISD::USUBSAT.
Target independent type legalization and custom lowering
should be able to handle it.

llvm-svn: 368537
2019-08-11 17:36:45 +00:00
Craig Topper 9758e0e1bf [X86] Remove some more code from combineShuffle that is no longer needed with widening legalization.
llvm-svn: 368523
2019-08-11 02:17:18 +00:00
Craig Topper 0f74b82ef1 [X86] Remove some code from combineShuffle that seems largely unnecessary with widening legalization.
The test case that changed is probably better served through
allowing combineTruncatedArithmetic to create narrow vectors. It
also appears InstCombine would have simplified this test case
to remove the zext and trunc anyway.

llvm-svn: 368522
2019-08-11 02:08:38 +00:00
Simon Pilgrim ec128709f0 [X86][SSE] Lower shuffle as ANY_EXTEND_VECTOR_INREG
On SSE41+ targets we always lower vector shuffles to ZERO_EXTEND_VECTOR_INREG, even if we don't need the extended bits.

This patch relaxes this so that we lower to ANY_EXTEND_VECTOR_INREG if we can, meaning that shuffle combines have a better idea of what elements need to be kept zero. This helps the multiple reduction code as we can now combine away a lot more of the pack+extend codes.

Differential Revision: https://reviews.llvm.org/D65741

llvm-svn: 368515
2019-08-10 16:46:07 +00:00
Craig Topper 74c43a2277 [X86] Match the IR pattern form movmsk on SSE1 only targets where v4i32 isn't legal
Summary:
This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that.

This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type.

Reviewers: spatel, RKSimon, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65689

llvm-svn: 368506
2019-08-10 07:51:13 +00:00
Craig Topper a8e5e73711 [X86] Improve the diagnostic for larger than 4-bit immediate for vpermil2pd/ps. Only allow MCConstantExprs.
llvm-svn: 368505
2019-08-10 04:28:52 +00:00
Luo, Yuanke c6c86f4f81 [X86] Fix stack probe issue on windows32.
Summary:
On windows if the frame size exceed 4096 bytes, compiler need to
generate a call to _alloca_probe. X86CallFrameOptimization pass
changes the reserved stack size and cause of stack probe function
not be inserted. This patch fix the issue by detecting the call
frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization.

Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon

Reviewed By: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65923

llvm-svn: 368503
2019-08-10 02:49:02 +00:00