Commit Graph

26361 Commits

Author SHA1 Message Date
David Bolvansky d0080c3a5f [DAGCombiner] Fold 0 div/rem X to 0
Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover

Reviewed By: RKSimon

Subscribers: craig.topper, llvm-commits

Differential Revision: https://reviews.llvm.org/D52504

llvm-svn: 345721
2018-10-31 14:18:57 +00:00
Nicolai Haehnle 814abb59df AMDGPU: Rewrite SILowerI1Copies to always stay on SALU
Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we use bitwise masking operations to combine lane masks in a way
that is consistent with wave control flow.

Move SIFixSGPRCopies to before this pass, since that pass
incorrectly attempts to move SGPR phis to VGPRs.

This should recover most of the code quality that was lost with
the bug fix in "AMDGPU: Remove PHI loop condition optimization".

There are still some relevant cases where code quality could be
improved, in particular:

- We often introduce redundant masks with EXEC. Ideally, we'd
  have a generic computeKnownBits-like analysis to determine
  whether masks are already masked by EXEC, so we can avoid this
  masking both here and when lowering uniform control flow.

- The criterion we use to determine whether a def is observed
  from outside a loop is conservative: it doesn't check whether
  (loop) branch conditions are uniform.

Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b

Reviewers: arsenm, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53496

llvm-svn: 345719
2018-10-31 13:27:08 +00:00
Nicolai Haehnle 28212cc689 AMDGPU: Remove PHI loop condition optimization
Summary:
The optimization to early break out of loops if all threads are dead was
never fully implemented.

But the PHI node analyzing is actually causing a number of problems, so
remove all the extra code for it.

(This does actually regress code quality in a few places because it
 ends up relying more heavily on phi's of i1, which we don't do a
 great job with. However, since it fixes real bugs in the wild, we
 should take this change. I have some prototype changes to improve
 i1 lowering in general -- not just for control flow -- which should
 help recover the code quality, I just need to make those changes
 fit for general consumption. -- Nicolai)

Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1
Patch-by: Christian König <christian.koenig@amd.com>

Reviewers: arsenm, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53359

llvm-svn: 345718
2018-10-31 13:26:48 +00:00
Neil Henning 63718b214a [AMDGPU] support image load/store a16
Our a16 support was only enabled for sample/gather and buffer
load/store, but not for image load/store operations (which take an i16
as the pixel index rather than a half).

Fix our isel lowering and add test cases to prove it out.

Differential Revision: https://reviews.llvm.org/D53750

llvm-svn: 345710
2018-10-31 10:34:48 +00:00
Sanjin Sijaric fadebc8aae [ARM64] [Windows] Exception handling support in frame lowering
Emit pseudo instructions indicating unwind codes corresponding to each
instruction inside the prologue/epilogue.  These are used by the MCLayer to
populate the .xdata section.

Differential Revision: https://reviews.llvm.org/D50288

llvm-svn: 345701
2018-10-31 09:27:01 +00:00
Martin Storsjo 315357faca [AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk
This is similar to SVN r311061 for ARM.

Differential Revision: https://reviews.llvm.org/D53878

llvm-svn: 345698
2018-10-31 08:14:09 +00:00
Matthias Braun a83403892a MachineOperand/MIParser: Do not print debug-use flag, infer it
The debug-use flag must be set exactly for uses on DBG_VALUEs.  This is
so obvious that it can be trivially inferred while parsing. This will
reduce noise when printing while omitting an information that has little
value to the user.

The parser will keep recognizing the flag for compatibility with old
`.mir` files.

Differential Revision: https://reviews.llvm.org/D53903

llvm-svn: 345671
2018-10-30 23:28:27 +00:00
David Bolvansky f739066e8a [ARM][NFC] Make tests immune to better div optimizations
Summary: Related to D52504

Reviewers: spatel

Reviewed By: spatel

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53901

llvm-svn: 345665
2018-10-30 22:08:13 +00:00
Konstantin Zhuravlyov 2d22d24ac4 Revert r345542: AMDGPU: Enable code object v3 by default
It breaks mesa.

llvm-svn: 345662
2018-10-30 22:02:40 +00:00
Cameron McInally 2ad870e785 [FPEnv] [FPEnv] Add constrained intrinsics for MAXNUM and MINNUM
Differential Revision: https://reviews.llvm.org/D53216

llvm-svn: 345650
2018-10-30 21:01:29 +00:00
Sanjay Patel caf5f5c490 [x86] try to make test immune to better div optimization; NFCI
llvm-svn: 345642
2018-10-30 20:46:23 +00:00
Mandeep Singh Grang 71e0cc2a0b [COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg
Summary:
    Thunk functions in Windows are varag functions that call a musttail function
    to pass the arguments after the fixup is done.  We need to make sure that we
    forward the arguments from the caller vararg to the callee vararg function.
    This is the same mechanism that is used for Windows on X86.

Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53843

llvm-svn: 345641
2018-10-30 20:46:10 +00:00
Sanjay Patel 00935bc1ba [x86] try to make test immune to better div optimization; NFCI
llvm-svn: 345640
2018-10-30 20:44:54 +00:00
Sanjay Patel 059ebc90ee [x86] try to make test immune to better div optimization; NFCI
llvm-svn: 345639
2018-10-30 20:42:03 +00:00
Bjorn Pettersson fe09a20f09 [DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad
Summary:
Normalize the offset for endianess before checking
if the store cover the load in ForwardStoreValueToDirectLoad.

Without this we missed out on some optimizations for big
endian targets. If for example having a 4 bytes store followed
by a 1 byte load, loading the least significant byte from the
store, the STCoversLD check would fail (see @test4 in
test/CodeGen/AArch64/load-store-forwarding.ll).

This patch also fixes a problem seen in an out-of-tree target.
The target has i40 as a legal type, it is big endian,
and the StoreSize for i40 is 48 bits. So when normalizing
the offset for endianess we need to take the StoreSize into
account (assuming that padding added when storing into
a larger StoreSize always is added at the most significant
end).

Reviewers: niravd

Reviewed By: niravd

Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho

Differential Revision: https://reviews.llvm.org/D53776

llvm-svn: 345636
2018-10-30 20:16:39 +00:00
Eli Friedman 93d0129b78 [AArch64] [Windows] SEH opcodes should be scheduling boundaries.
Prevents the post-RA scheduler from modifying the prologue sequences
emitting by frame lowering. This is roughly similar to what we do for
other targets: TargetInstrInfo::isSchedulingBoundary checks
isPosition(), which checks for CFI_INSTRUCTION.

isSEHInstruction is taken from D50288; it'll land with whatever patch
lands first.

Differential Revision: https://reviews.llvm.org/D53851

llvm-svn: 345634
2018-10-30 19:24:51 +00:00
David Greene 3e89fa8e08 [AArch64] Create proper memoperand for multi-vector stores
Re-apply r345315 with testcase fixes.

Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345631
2018-10-30 19:17:51 +00:00
Craig Topper 6958b5ffa9 [X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS work correctly if we already walked through a bitcast that changed the element size.
The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors.

This patch switchs to using the concat_vectors input element count directly instead.

Differential Revision: https://reviews.llvm.org/D53823

llvm-svn: 345626
2018-10-30 18:48:42 +00:00
Jonas Paulsson 611b533f1d [SchedModel] Fix for read advance cycles with implicit pseudo operands.
The SchedModel allows the addition of ReadAdvances to express that certain
operands of the instructions are needed at a later point than the others.

RegAlloc may add pseudo operands that are not part of the instruction
descriptor, and therefore cannot have any read advance entries. This meant
that in some cases the desired read advance was nullified by such a pseudo
operand, which still had the original latency.

This patch fixes this by making sure that such pseudo operands get a zero
latency during DAG construction.

Review: Matthias Braun, Ulrich Weigand.
https://reviews.llvm.org/D49671

llvm-svn: 345606
2018-10-30 15:04:40 +00:00
Sanjay Patel 8b207defea [DAGCombiner] narrow vector binops when extraction is cheap
Narrowing vector binops came up in the demanded bits discussion in D52912.

I don't think we're going to be able to do this transform in IR as a canonicalization 
because of the risk of creating unsupported widths for vector ops, but we already have 
a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is 
currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression 
test diffs).

This is artificially limited to not look through bitcasts because there are so many 
test diffs already, but that's marked with a TODO and is a small follow-up.

Differential Revision: https://reviews.llvm.org/D53784

llvm-svn: 345602
2018-10-30 14:14:34 +00:00
Francis Visoiu Mistrih 85d3f1ee8f [llc] Error out when -print-machineinstrs is used with an unknown pass
We used to assert instead of reporting an error.

PR39494

llvm-svn: 345589
2018-10-30 12:07:18 +00:00
Roman Lebedev 9ffca9b83c [X86] Add extra-uses on the mask of pattern c of extract-{low,}bits.ll tests
Summary:
Because of the D48768, that pattern is always unfolded into pattern d,
thus we had no test coverage.

Reviewers: RKSimon, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53574

llvm-svn: 345583
2018-10-30 11:12:29 +00:00
Simon Pilgrim 858303b827 [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodes
Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy.

This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle.	
	
Differential Revision: https://reviews.llvm.org/D53760

llvm-svn: 345578
2018-10-30 10:32:11 +00:00
David Bolvansky dfdbb038e8 [DAGCombiner] Improve X div/rem Y fold if single bit element type
Summary: Tests by @spatel, thanks

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: sdardis, atanasyan, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D52668

llvm-svn: 345575
2018-10-30 09:07:22 +00:00
Craig Topper b293322cee [LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened
Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86.

Reviewers: efriedma, RKSimon

Reviewed By: efriedma

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53229

llvm-svn: 345567
2018-10-30 03:27:15 +00:00
Craig Topper 2640795c94 [AArch64] Add test case for D53229. NFC
llvm-svn: 345566
2018-10-30 03:27:13 +00:00
Matt Arsenault abc4f29f9c AMDGPU: Remove custom BUILD_VECTOR combine
This was looping in a testcase and removing it
now slightly improves a test.

llvm-svn: 345560
2018-10-30 01:37:59 +00:00
Matt Arsenault b0b741efb8 AMDGPU: Use scavengeRegisterBackwards
llvm-svn: 345559
2018-10-30 01:33:14 +00:00
Konstantin Zhuravlyov 5cb950200c AMDGPU: Enable code object v3 by default
Differential Revision: https://reviews.llvm.org/D53525

llvm-svn: 345542
2018-10-29 21:07:27 +00:00
Jessica Paquette e3932eeea4 [MachineOutliner] Inherit target features from parent function
If a function has target features, it may contain instructions that aren't
represented in the default set of instructions. If the outliner pulls out one
of these instructions, and the function doesn't have the right attributes
attached, we'll run into an LLVM error explaining that the target doesn't
support the necessary feature for the instruction.

This makes outlined functions inherit target features from their parents.

It also updates the machine-outliner.ll test to check that we're properly
inheriting target features.

llvm-svn: 345535
2018-10-29 20:27:07 +00:00
Matthias Braun c045c557b0 Relax fast register allocator related test cases; NFC
- Relex hard coded registers and stack frame sizes
- Some test cleanups
- Change phi-dbg.ll to match on mir output after phi elimination instead
  of going through the whole codegen pipeline.

This is in preparation for https://reviews.llvm.org/D52010
I'm committing all the test changes upfront that work before and after
independently.

llvm-svn: 345532
2018-10-29 20:10:42 +00:00
Thomas Lively eb15d00193 [WebAssembly] Lower away condition truncations for scalar selects
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53676

llvm-svn: 345521
2018-10-29 18:38:12 +00:00
Simon Pilgrim 3a2f3c2c0a [X86][SSE] getFauxShuffleMask - Fix shuffle mask adjustment for multiple inserted subvectors
Part of the issue discovered in PR39483, although its not fully exposed until I reapply rL345395 (by reverting rL345451)

llvm-svn: 345520
2018-10-29 18:25:48 +00:00
Stanislav Mekhanoshin 79080ecd82 [AMDGPU] Match v_swap_b32
Differential Revision: https://reviews.llvm.org/D52677

llvm-svn: 345514
2018-10-29 17:26:01 +00:00
Francis Visoiu Mistrih 61c9de7565 [X86] Enable the MachineVerifier by default
The machine verifier was disabled for x86 by default. There are now only
9 tests failing, compared to what previously was between 20 and 30.

This is a good opportunity to file bugs for all the remaining issues,
then explicitly disable the failing tests and enabling the machine
verifier by default.

This allows us to avoid adding new tests that break the verifier.

PR27481

llvm-svn: 345513
2018-10-29 16:57:43 +00:00
Leonard Chan 905abe5b5d [Intrinsic] Signed and Unsigned Saturation Subtraction Intirnsics
Add an intrinsic that takes 2 integers and perform saturation subtraction on
them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53783

llvm-svn: 345512
2018-10-29 16:54:37 +00:00
Luke Cheeseman 71c989ae1f [AArch64] Return address signing B key support
- Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return
  address signing
- The key used to sign the function is controlled by the function attribute
  "sign-return-address-key"

Differential Revision: https://reviews.llvm.org/D51427

llvm-svn: 345511
2018-10-29 16:26:58 +00:00
Francis Visoiu Mistrih bb1bd9ed79 [X86] Remove outdated test
This test breaks the X86 MachineVerifier. It looks like the MIR part is
completely useless.

The original author suggests that it can be removed.

Differential Revision: https://reviews.llvm.org/D53767

llvm-svn: 345501
2018-10-29 13:41:46 +00:00
Sjoerd Meijer d208461e2d [ARM][NFC] Fix test inlineasm-X-allocation.ll
Differential Revision: https://reviews.llvm.org/D53748

llvm-svn: 345491
2018-10-29 08:45:56 +00:00
Craig Topper aa5eb2fbaa [X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers.
When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction.

Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that.

llvm-svn: 345488
2018-10-29 04:52:04 +00:00
Craig Topper 42aa87143d [X86] Recognize constant splats in LowerFCOPYSIGN.
llvm-svn: 345484
2018-10-28 23:51:35 +00:00
Craig Topper 8164f3923e [X86] Add test case to show failure to handle splat vectors in the constant check in LowerFCOPYSIGN.
llvm-svn: 345483
2018-10-28 23:51:33 +00:00
Roman Lebedev 1c340bc7ca [X86][NFC] sse42-schedule.ll: disable XOP for BdVer2 tests
Else we are clearly testing the wrong instruction.

llvm-svn: 345476
2018-10-28 13:39:10 +00:00
Roman Lebedev 3adf88b746 [X86][NFC] sse41-schedule.ll: disable XOP for BdVer2 tests
Else we are clearly testing the wrong instruction.

llvm-svn: 345475
2018-10-28 13:39:06 +00:00
Roman Lebedev cc554e4456 [X86][NFC] sse2-schedule.ll: disable XOP for BdVer2 tests
Else we are clearly testing the wrong instruction.

llvm-svn: 345474
2018-10-28 13:39:01 +00:00
Simon Pilgrim 9b77f0c291 [VectorLegalizer] Enable TargetLowering::expandFP_TO_UINT support.
Add vector support to TargetLowering::expandFP_TO_UINT.

This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this.

llvm-svn: 345473
2018-10-28 13:07:25 +00:00
Craig Topper c4b785ae1e [DAGCombiner] Better constant vector support for FCOPYSIGN.
Enable constant folding when both operands are vectors of constants.

Turn into FNEG/FABS when the RHS is a splat constant vector.

llvm-svn: 345469
2018-10-28 01:32:49 +00:00
Craig Topper f206447dcd [X86] Add test cases showing missed opportunities for optimizing vector fcopysign when the RHS is a splat constant.
llvm-svn: 345468
2018-10-28 01:32:47 +00:00
Roman Lebedev a5baf86744 AMD BdVer2 (Piledriver) Initial Scheduler model
Summary:
# Overview
This is somewhat partial.
* Latencies are good {F7371125}
  * All of these remaining inconsistencies //appear// to be noise/noisy/flaky.
* NumMicroOps are somewhat good {F7371158}
  * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes
* Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there.
  They are basically verbatum copy from `btver2`
* Many `InstRW`. And there are still inconsistencies left...

To be noted:
I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis!

# Benchmark
I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed | RawSpeed raw image decoding library ]].
Diff (the exact clang from trunk without/with this patch):
```
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean                              -0.0607         -0.0604           234           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median                            -0.0630         -0.0626           233           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev                            +0.2581         +0.2587             1             2             1             2
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean                              -0.0770         -0.0767           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median                            -0.0767         -0.0763           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev                            -0.4170         -0.4156             1             0             1             0
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue                                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean                                           -0.0271         -0.0270           463           450           463           450
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median                                         -0.0093         -0.0093           453           449           453           449
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev                                         -0.7280         -0.7280            13             4            13             4
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean                                           -0.0065         -0.0065           569           565           569           565
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median                                         -0.0077         -0.0077           569           564           569           564
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev                                         +1.0077         +1.0068             2             5             2             5
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue                                          0.0220          0.0199      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean                                           +0.0006         +0.0007           312           312           312           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median                                         +0.0031         +0.0032           311           312           311           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev                                         -0.7069         -0.7072             4             1             4             1
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean                                           -0.0015         -0.0015           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median                                         -0.0010         -0.0011           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev                                         -0.1486         -0.1456             0             0             0             0
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue                                          0.6139          0.8766      U Test, Repetitions: 25 vs 25
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean                                           -0.0008         -0.0005            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median                                         -0.0006         -0.0002            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev                                         -0.1467         -0.1390             0             0             0             0
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue                                          0.0137          0.0137      U Test, Repetitions: 25 vs 25
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean                                           +0.0002         +0.0002           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median                                         -0.0015         -0.0014           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev                                         +3.3687         +3.3587             0             2             0             2
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue                                     0.4041          0.3933      U Test, Repetitions: 25 vs 25
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean                                      +0.0004         +0.0004            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median                                    -0.0000         -0.0000            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev                                    +0.1947         +0.1995             0             0             0             0
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue                              0.0074          0.0001      U Test, Repetitions: 25 vs 25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean                               -0.0092         +0.0074           547           542            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median                             -0.0054         +0.0115           544           541            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev                             -0.4086         -0.3486             8             5             0             0
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue                                        0.3320          0.0000      U Test, Repetitions: 25 vs 25
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean                                         +0.0015         +0.0204           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median                                       +0.0001         +0.0203           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev                                       +0.2259         +0.2023             1             1             0             0
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue                                      0.0000          0.0001      U Test, Repetitions: 25 vs 25
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean                                       -0.0209         -0.0179            96            94            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median                                     -0.0182         -0.0155            95            93            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev                                     -0.6164         -0.2703             2             1             2             1
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue                                     0.0000          0.0000      U Test, Repetitions: 25 vs 25
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean                                      -0.0098         -0.0098           176           175           176           175
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median                                    -0.0126         -0.0126           176           174           176           174
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev                                    +6.9789         +6.9157             0             2             0             2
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 25 vs 25
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean                  -0.0237         -0.0238           474           463           474           463
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median                -0.0267         -0.0267           473           461           473           461
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev                +0.7179         +0.7178             3             5             3             5
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue                   0.6837          0.6554      U Test, Repetitions: 25 vs 25
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean                    -0.0014         -0.0013          1375          1373          1375          1373
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median                  +0.0018         +0.0019          1371          1374          1371          1374
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev                  -0.7457         -0.7382            11             3            10             3
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue                                        0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean                                         -0.0080         -0.0289            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median                                       -0.0070         -0.0287            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev                                       +1.0977         +0.6614             0             0             0             0
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue                                       0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean                                        +0.0132         +0.0967            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median                                      +0.0132         +0.0956            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev                                      -0.0407         -0.1695             0             0             0             0
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue                                      0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean                                       +0.0331         +0.1307            13            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median                                     +0.0430         +0.1373            12            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev                                     -0.9006         -0.8847             1             0             0             0
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue                                            0.0016          0.0010      U Test, Repetitions: 25 vs 25
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean                                             -0.0023         -0.0024           395           394           395           394
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median                                           -0.0029         -0.0030           395           394           395           393
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev                                           -0.0275         -0.0375             1             1             1             1
Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue                                          0.0232          0.0000      U Test, Repetitions: 25 vs 25
Phase One/P65/CF027310.IIQ/threads:8/real_time_mean                                           -0.0047         +0.0039           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_median                                         -0.0050         +0.0037           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev                                         -0.0599         -0.2683             1             1             0             0
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean                           +0.0206         +0.0207           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median                         +0.0204         +0.0205           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev                         +0.2155         +0.2212             1             1             1             1
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean                          -0.0109         -0.0108           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median                        -0.0104         -0.0103           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev                        -0.4919         -0.4800             0             0             0             0
Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue                                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean                                          -0.0149         -0.0147           220           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_median                                        -0.0173         -0.0169           221           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev                                        +1.0337         +1.0341             1             3             1             3
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue                                         0.0001          0.0001      U Test, Repetitions: 25 vs 25
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean                                          -0.0019         -0.0019           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median                                        -0.0021         -0.0021           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev                                        -0.4441         -0.4282             0             0             0             0
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue                                0.0000          0.4263      U Test, Repetitions: 25 vs 25
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean                                 +0.0258         -0.0006            81            83            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median                               +0.0235         -0.0011            81            82            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev                               +0.1634         +0.1070             1             1             0             0
```
{F7443905}
If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`),
and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`);
Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%`
Looks good so far i'd say.

llvm-exegesis details:
{F7371117} {F7371125}
{F7371128} {F7371144} {F7371158}

Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh

Reviewed By: andreadb

Subscribers: javed.absar, gbedwell, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52779

llvm-svn: 345463
2018-10-27 20:46:30 +00:00
Roman Lebedev a51921877a [NFC][X86] Baseline tests for AMD BdVer2 (Piledriver) Scheduler model
Adding the baseline tests in a preparatory NFC commit,
so that the actual commit shows the *diff*.

Yes, i'm aware that a few of these codegen-based sched tests
are testing wrong instructions, i will fix that afterwards.

For https://reviews.llvm.org/D52779

llvm-svn: 345462
2018-10-27 20:36:11 +00:00