Commit Graph

73 Commits

Author SHA1 Message Date
Clement Courbet 30477197b3 [ExpandMemCmp][NFC] Add more tests. 2020-03-10 11:34:19 +01:00
Clement Courbet 6518b72f93 [ExpandMemCmp] Properly constant-fold all compares.
Summary:
This gets rid of duplicated code and diverging behaviour w.r.t.
constants.
Fixes PR45086.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75519
2020-03-09 10:40:52 +01:00
Clement Courbet f7e6f5f8e3 [ExpandMemCmp] Properly constant-fold all compares.
Summary:
This gets rid of duplicated code and diverging behaviour w.r.t.
constants.
Fixes PR45086.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75519
2020-03-09 09:10:34 +01:00
Clement Courbet c68d35d78c [ExpandMemCmp] Add more tests to show missing constant folding. 2020-03-03 14:57:11 +01:00
David Zarzycki cb6822c9de
[X86] Reland: Enable YMM memcmp with AVX1
Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp.

This is a follow up to D68632 which enabled XOR compares which made this possible.

This also updates the memcmp-optsize.ll test unlike the first patch.

https://reviews.llvm.org/D69658
2019-11-01 08:58:48 +02:00
Simon Pilgrim 04813ded98 Revert rG0e252ae19ff8d99a59d64442c38eeafa5825d441 : [X86] Enable YMM memcmp with AVX1
Breaks build bots

Differential Revision: https://reviews.llvm.org/D69658
2019-10-31 19:05:04 +00:00
David Zarzycki 0e252ae19f [X86] Enable YMM memcmp with AVX1
Update TargetTransformInfo to allow AVX1 to use YMM registers for memcmp.

This is a follow up to D68632 which enabled XOR compares which made this possible.

https://reviews.llvm.org/D69658
2019-10-31 20:07:07 +02:00
David Zarzycki 657e4240b1 [X86] Fix 48/96 byte memcmp code gen
Detect scalar ISD::ZERO_EXTEND generated by memcmp lowering and convert
it to ISD::INSERT_SUBVECTOR.

https://reviews.llvm.org/D69464
2019-10-28 08:41:45 +02:00
David Zarzycki 11c920207a [X86] Prefer KORTEST on Knights Landing or later for memcmp()
PTEST and especially the MOVMSK instructions are slow on Knights Landing
or later. As a bonus, this patch increases instruction parallelism by
emitting:
    KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0
Instead of:
    KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0

https://reviews.llvm.org/D69157
2019-10-26 21:14:57 +03:00
David Zarzycki 0d0509384f [X86] NFC: expand inline memcmp test coverage
1) Adds SSE4.1 coverage.
2) Adds prefer-256-bit or not coverage.
3) Adds more power-of-two tests up to 512 bytes.
4) Adds power-of-two-minus-one tests to verify overlapping loads.
5) Adds power-of-two-plus-one-half tests (48, 96, 192, and 384).
6) Adds greater-than/less-than tests from 16 to 512 bytes.

https://reviews.llvm.org/D69222
2019-10-26 21:14:57 +03:00
Simon Pilgrim ef04598e14 [X86] Regenerate memcmp tests and add X64-AVX512 common prefix
Should help make the changes in D69157 clearer

llvm-svn: 375215
2019-10-18 09:59:51 +00:00
David Zarzycki 59390efef2 [X86] Make memcmp() use PTEST if possible and also enable AVX1
llvm-svn: 374922
2019-10-15 17:40:12 +00:00
David Zarzycki 7653ff398d [X86] Enable AVX512BW for memcmp()
llvm-svn: 373845
2019-10-06 10:25:52 +00:00
David Zarzycki 03b216d854 [X86] Enable inline memcmp() to use AVX512
llvm-svn: 373706
2019-10-04 07:42:34 +00:00
Dmitri Gribenko 2bf8d77453 Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.""
This reverts commit r371502, it broke tests
(clang/test/CodeGenCXX/auto-var-init.cpp).

llvm-svn: 371507
2019-09-10 10:39:09 +00:00
Clement Courbet 612c260ec3 Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
With a fix for sanitizer breakage (see explanation in D60318).

llvm-svn: 371502
2019-09-10 09:18:00 +00:00
Clement Courbet 2851248fa1 Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
Breaks sanitizers:
    libFuzzer :: cxxstring.test
    libFuzzer :: memcmp.test
    libFuzzer :: recommended-dictionary.test
    libFuzzer :: strcmp.test
    libFuzzer :: value-profile-mem.test
    libFuzzer :: value-profile-strcmp.test

llvm-svn: 364416
2019-06-26 12:13:13 +00:00
Clement Courbet 7b3a5f0e6d [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.
This allows later passes (in particular InstCombine) to optimize more
cases.

One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.

llvm-svn: 364412
2019-06-26 11:50:18 +00:00
Clement Courbet be98e0ab78 [ExpandMemCmp] Honor prefer-vector-width.
Reviewers: gchatelet, echristo, spatel, atdt

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63769

llvm-svn: 364384
2019-06-26 07:06:49 +00:00
Clement Courbet 1d8c9dfe03 [ExpandMemCmp][NFC] Add tests for `memcmp(p, q, n) < 0` case.
llvm-svn: 357767
2019-04-05 15:03:25 +00:00
Clement Courbet 8e16d73346 [SelectionDAG] Allow the user to specify a memeq function.
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.

`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56593

llvm-svn: 355672
2019-03-08 09:07:45 +00:00
Craig Topper 509a8a3cf1 [DAGCombiner][X86][SystemZ][AArch64] Combine some cases of (bitcast (build_vector constants)) between legalize types and legalize dag.
This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality

Differential Revision: https://reviews.llvm.org/D58884

llvm-svn: 355324
2019-03-04 19:12:16 +00:00
Clement Courbet 36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Clement Courbet e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet 1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Clement Courbet 7925d58eae [X86][NFC] Add more constant-size memcmp tests.
llvm-svn: 348257
2018-12-04 12:35:51 +00:00
Sanjay Patel 5a48aef3f0 [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325

We're trading branch and compares for loads and logic ops. 
This makes the code smaller and hopefully faster in most cases.

The 24-byte test shows an interesting construct: we load the trailing scalar 
elements into vector registers and generate the same pcmpeq+movmsk code that 
we expected for a pair of full vector elements (see the 32- and 64-byte tests).

Differential Revision: https://reviews.llvm.org/D41714

llvm-svn: 321934
2018-01-06 16:16:04 +00:00
Francis Visoiu Mistrih 25528d6de7 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Clement Courbet 063bed9baf re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."
Fix undefined references: ExpandMemCmp belongs to CodeGen/, not Scalar/.

llvm-svn: 317318
2017-11-03 12:12:27 +00:00
Clement Courbet 82bade615b Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage

This reverts commit eea333c33fa73ad225ef28607795984829f65688.

llvm-svn: 317213
2017-11-02 15:53:10 +00:00
Clement Courbet 1dc37b9c3b [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass.
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.

See the discussion in PR33325 for more details.

Reviewers: spatel

Subscribers: mgorny

Differential Revision: https://reviews.llvm.org/D39456

llvm-svn: 317211
2017-11-02 15:02:51 +00:00
Clement Courbet b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Clement Courbet be684eee82 [CodeGen][ExpandMemCmp][NFC] Simplify load sequence generation.
llvm-svn: 316763
2017-10-27 12:34:18 +00:00
Clement Courbet b237f044a4 [CodeGen][ExpandMemcmp][NFC] Make tests more complete.
llvm-svn: 316749
2017-10-27 08:33:51 +00:00
Clement Courbet 0c7cd071f7 Re-land "[CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads (1)"
Compute the actual decomposition only after deciding whether to expand
of not. Else, it's easy to make the compiler OOM with:
`memcpy(dst, src, 0xffffffffffffffff);`, which typically happens if
someone mistakenly passes a negative value. Add a test.

This reverts commit f8fc02fbd4ab33383c010d33675acf9763d0bd44.

llvm-svn: 316567
2017-10-25 11:02:09 +00:00
Sanjay Patel f762c7b32f [x86] add more vector ISA variants for memcmp expansion; NFC
...because every swiss cheese has different holes.

llvm-svn: 316446
2017-10-24 15:27:47 +00:00
Sanjay Patel 169dae70a6 [x86] use more shift or LEA for select-of-constants (2nd try)
The previous rev (r310208) failed to account for overflow when subtracting the
constants to see if they're suitable for shift/lea. This version add a check
for that and more test were added in r310490.

We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies
(they always become shl/lea) to avoid cmov or branching. The current code misses
cases where we have a negative constant and a positive constant, so this is just
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start
with a select in IR, create a select DAG node, convert it into a sext, convert it
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the 
   push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a 
   post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if
   that's a regression, but those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs.
5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310717
2017-08-11 15:44:14 +00:00
Sanjay Patel 807f92b8ff [x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097)
llvm-svn: 310264
2017-08-07 15:47:48 +00:00
Sanjay Patel a923c2ee95 [x86] use more shift or LEA for select-of-constants
We can convert any select-of-constants to math ops:
http://rise4fun.com/Alive/d7d

For this patch, I'm enhancing an existing x86 transform that uses fake multiplies 
(they always become shl/lea) to avoid cmov or branching. The current code misses 
cases where we have a negative constant and a positive constant, so this is just 
trying to plug that hole.

The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start 
with a select in IR, create a select DAG node, convert it into a sext, convert it 
back into a select, and then lower it to sext machine code.

Some notes about the test diffs:

1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR.
2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I 
   think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on 
   a different reg? That's a post-DAG problem though.
3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if 
   that's a regression, but I think those would always be nearly equivalent.
4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs.
5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0.
6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again.
7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops.

Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass.

Differential Revision: https://reviews.llvm.org/D35340

llvm-svn: 310208
2017-08-06 16:27:07 +00:00
Sanjay Patel 4dbdd470a8 [CGP] use narrower types in memcmp expansion when possible
This only affects very small memcmp on x86 for now, but it
will become more important if we allow vector-sized load and
compares.

llvm-svn: 309711
2017-08-01 17:24:54 +00:00
Sanjay Patel fea731a4aa [CGP] use subtract or subtract-of-cmps for result of memcmp expansion
As noted in the code comment, transforming this in the other direction might require 
a separate transform here in CGP given the block-at-a-time DAG constraint.

Besides that theoretical motivation, there are 2 practical motivations for the 
subtract-of-cmps form:

1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). 
   There is discussion about canonicalizing IR to the select form 
   ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), 
   so we probably need to add DAG transforms for those patterns anyway, but this improves the 
   memcmp output without waiting for that step.

2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert
   that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided
   if we choose to enable that.

Differential Revision: https://reviews.llvm.org/D34904

llvm-svn: 309597
2017-07-31 18:08:24 +00:00
Simon Pilgrim 18b97f78fe [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).

Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).

This patch should be considered for the 5.0.0 release branch as well

Differential Revision: https://reviews.llvm.org/D35830

llvm-svn: 308986
2017-07-25 17:04:37 +00:00
Simon Pilgrim 3459f108f8 [X86] Add 24-byte memcmp tests (PR33914)
llvm-svn: 308963
2017-07-25 10:33:36 +00:00
Simon Pilgrim 483927aefb [x86, CGP] increase memcmp() expansion up to 4 load pairs
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.

Notes:

    Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.

    There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.

    We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:

https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.

Committed on behalf of Sanjay Patel

Differential Revision: https://reviews.llvm.org/D35067

llvm-svn: 308322
2017-07-18 15:55:30 +00:00
Simon Pilgrim 420e5eadc2 [X86] Added cmov target to memcmp test
As discussed by @spatel on D35067:

"I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)."

llvm-svn: 308309
2017-07-18 14:19:34 +00:00
Simon Pilgrim e5e9232260 [X86] Updated 32-bit memcmp tests to run with/without SSE2
llvm-svn: 306816
2017-06-30 11:23:59 +00:00
Sanjay Patel 4b23fa0abf [CGP] add specialization for memcmp expansion with only one basic block
llvm-svn: 306485
2017-06-27 23:15:01 +00:00
Sanjay Patel 70b36f193d [CGP] eliminate a sub instruction in memcmp expansion
As noted in D34071, there are some IR optimization opportunities that could be 
handled by normal IR passes if this expansion wasn't happening so late in CGP.

Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, 
so I'm proposing this change:
  %s = sub i32 %x, %y
  %r = icmp ne %s, 0
    =>
  %r = icmp ne %x, %y

Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.

The fact that the PowerPC backend doesn't eliminate the 'subf.' might be 
something for PPC folks to investigate separately.

Differential Revision: https://reviews.llvm.org/D34416

llvm-svn: 306471
2017-06-27 21:46:34 +00:00
Sanjay Patel 0656629b87 [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)

llvm-svn: 305802
2017-06-20 15:58:30 +00:00
Sanjay Patel b5a03797df [x86] remove unused param from tests; NFC
llvm-svn: 304989
2017-06-08 17:02:39 +00:00