Summary:
MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load.
Inspired by PR38840.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D52121
llvm-svn: 342326
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814
If this works, it's an easier and more powerful solution than adding pattern matching
for a few special cases in the backend. The potential danger with this transform in IR
is that the condition value can get separated from the select, and the backend might
not be able to make a blendv out of it again. I don't think that's too likely, but
I've kept this patch minimal with a 'TODO', so we can test that theory in the wild
before expanding the transform.
Differential Revision: https://reviews.llvm.org/D52059
llvm-svn: 342324
CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it
easier for the backend to emit fancy extract-bits instructions (e.g UBFX).
Teach it to preserve debug locations and salvage debug values.
llvm-svn: 342319
Summary:
Implement shifts of vectors by i32. Since LLVM defines shifts as
binary operations between two vectors, this involves pattern matching
on splatted shift operands. For v2i64 shifts any i32 shift operands
have to be zero extended in the input and any i64 shift operands have
to be wrapped in the output. Depends on D52007.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51906
llvm-svn: 342302
Currently if we got something like `const Foo` we'd ignore it and
just rely on printing the unmodified `Foo` later on. However,
for testing the native reading code we really would like to be able
to see these so that we can verify that the native reader can
actually handle them. Instead of printing out the full type though,
just print out the header.
llvm-svn: 342295
Similar to rL342278:
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
D52075 should be able to use this code too rather than
duplicating all of the logic.
llvm-svn: 342292
Summary:
Integer types smaller than i32 must be extended to i32 by default.
The feature "crbits" introduced at r202451 handles i1 as a special case,
but it did not extend properly.
The caller was, therefore, passing i1 stack arguments by writing 0/1 to
the first byte of the 4-byte stack object and callee was
reading the first byte for the value.
"crbits" is enabled if the optimization level is greater than 1,
which is very common in "release builds".
Such discrepancies with ABI specification also introduces
potential incompatibility with programs or libraries
built with other compilers e.g. GCC.
Fixes PR38661
Reviewers: hfinkel, cuviper
Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D51108
llvm-svn: 342288
Summary:
This change makes the tests more focused and avoids problematic
interactions between the testing modes and instruction encoding. This
change also allows the other tests to use less verbose output and
stricter checks.
Reviewers: aheejin, dschuff, aardappel
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52007
llvm-svn: 342287
The patch saves a function offset table which maps function name index to the
offset of its function profile to the start of the binary profile. By using
the function offset table, for those function profiles which will not be used
when compiling a module, the profile reader does't have to read them. For
profile size around 10~20M, it saves ~10% compile time.
Differential Revision: https://reviews.llvm.org/D51863
llvm-svn: 342283
This test constructs a non-readable file of mode 0111, which lingers in the test output directory and will cause EACCES to various tools (rg, rsync, ...)
llvm-svn: 342279
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
llvm-svn: 342278
The test used to fail with an invalid phi node: the two predecessors were outlined
and the SSA representation was left invalid. The patch adds the exit block to the
cold region.
llvm-svn: 342277
This patch fixes the debug info handling for SelectionDAG legalization
of DAG nodes with two results. When an replaced SDNode has more than
one result, transferDbgValues was always copying the SDDbgValue from
the first result and attaching them to all members. In reality
SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes
(though the type signature doesn't make this obvious (cf. the call
site code in ReplaceNode()).
rdar://problem/44162227
Differential Revision: https://reviews.llvm.org/D52112
llvm-svn: 342264
Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation.
As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work.
Differential Revision: https://reviews.llvm.org/D52043
llvm-svn: 342257
The folds are not limited to zext, and the real goal is width
reduction of a math op. D52075 is proposing to extend this to
subtracts.
llvm-svn: 342254
There was a bug in a check line regex that could cause the test to fail
with a naming difference. The auto-gen script seems to work as expected now.
llvm-svn: 342249
Also added a C-interface function for large values, and updated
llvm-lto's --thinlto-cache-max-size-bytes switch to take a type larger
than int.
The maximum cache size in terms of bytes is a 64-bit number. However,
the methods to set it only took unsigned previously, which meant that
the maximum cache size could not be specified above 4GB. That's quite
small compared to the output of some projects, so it makes sense to
provide the ability to set larger values in that field.
We also needed a C-interface function that provides a greater range
than the existing thinlto_codegen_set_cache_size_bytes, which also only
takes an unsigned, so this change also adds
hinlto_codegen_set_cache_size_megabytes.
Reviewed by: mehdi_amini, tejohnson, steven_wu
Differential Revision: https://reviews.llvm.org/D52023
llvm-svn: 342233
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.
Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+
Also updated the tests to check correct behaviour
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51933
Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
llvm-svn: 342222
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 342210
As preparation for LoopInterchange becoming a loop pass, it needs to
preserve ScalarEvolution. Even though interchanging should not change
the trip count of the loop, it modifies loop entry, latch and exit
blocks.
I added -verify-scev to some loop interchange tests, but the verification does
not catch problems caused by missing invalidation of SE in loop interchange, as
the trip counts themselves do not change. So there might be potential to
make the SE verification covering more stuff in the future.
Reviewers: mkazantsev, efriedma, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D52026
llvm-svn: 342209
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.
This involves sext, zext and xor of i1.
The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.
Review: Ulrich Weigand
https://reviews.llvm.org/D51339
llvm-svn: 342207
Summary:
[VPlan] Implement vector code generation support for simple outer loops.
Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:
- force vector code generation using explicit vectorize_width
- add conservative early returns in cost model and other places for VPlanNativePath
- add code for setting up outer loop inductions
- support for widening non-induction PHIs that can result from inner loops and uniform conditional branches
- support for generating uniform inner branches
We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer.
Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal
Reviewed By: fhahn, hsaito
Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D50820
llvm-svn: 342197
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 342186
INLINEASM instructions use extra operands to carry flags. If a register operand is removed without removing the flag operand, then the flags will no longer make sense.
This patch fixes this by preventing the removal when a flag operand is present.
The included test case was generated by MS inline assembly. Longer term maybe we should fix the inline assembly parsing to not generate redundant operands.
Differential Revision: https://reviews.llvm.org/D51829
llvm-svn: 342176
When replacing a named register input to the appropriately sized
sub/super-register. In the case of a 64-bit value being assigned to a
register in 32-bit mode, match GCC's assignment.
Reviewers: eli.friedman, craig.topper
Subscribers: nickdesaulniers, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51502
llvm-svn: 342175
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA
We do need to have two `switch()`'es like this,
to not mismatch the swappable predicates.
https://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52001
llvm-svn: 342173
This adds DebugCounter support to the PartiallyInlineLibCalls pass,
which should make debugging/automated bisection easier in the future.
Patch by Zhizhou Yang!
Differential Revision: https://reviews.llvm.org/D50093
llvm-svn: 342172
Came up in https://reviews.llvm.org/D52001#1233827
While we don't do a good job here, we at least want to make
sure that we don't have any inf-loops.
llvm-svn: 342171