Summary:
This patch extends CFGSort pass to support exception handling. Once it
places a loop header, it does not place blocks that are not dominated by
the loop header until all the loop blocks are sorted. This patch extends
the same algorithm to exception 'catch' part, using the information
calculated by WebAssemblyExceptionInfo class.
Reviewers: dschuff, sunfish
Subscribers: sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D46500
llvm-svn: 339172
Instcombine gets some, but not all, of these cases via
it's internal reassociation transforms. It fails in
all cases with vector types.
llvm-svn: 339168
Summary:
Reworked the previously committed patch to insert shuffles for reused
extract element instructions in the correct position. Previous logic was
incorrect, and might lead to the crash with PHIs and EH instructions.
Reviewers: efriedma, javed.absar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50143
llvm-svn: 339166
getOrCompHotCountThreshold/getOrCompColdCountThreshold introduced in
https://reviews.llvm.org/D45377 contain a bad mistake and will only return 1 or 0
instead of the true hot/cold cutoff value. The patch fixes the mistake. But the
mistake seems not causing big performance difference according to internal server
benchmarks testing.
Differential Revision: https://reviews.llvm.org/D50370
llvm-svn: 339162
Scatter could have multiple identical indices. We need to maintain sequential order. We get this right in LegalizeVectorTypes, but not in this code.
Differential Revision: https://reviews.llvm.org/D50374
llvm-svn: 339157
In combineMetadata, we should be able to preserve K's nonnull metadata,
if K does not move. This condition should hold for all replacements by
NewGVN/GVN, but I added a bunch of assertions to verify that.
Fixes PR35038.
There probably are additional kinds of metadata that could be preserved
using similar reasoning. This is follow-up work.
Reviewers: dberlin, davide, efriedma, nlopes
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D47339
llvm-svn: 339149
This fixes an inconsistency in code generation when compiling with or
without debug information (-g). When debug information is available in
an empty block, the original test would fail, resulting in possibly
different code.
Patch by: Jeroen Dobbelaere
Differential revision: https://reviews.llvm.org/D49467
llvm-svn: 339129
When potential jump instruction and target are in the same segment, use
jump instruction with immediate field.
In cases where offset does not fit immediate value of a bc/j instructions,
offset is stored into register, and then jump register instruction is used.
Differential Revision: https://reviews.llvm.org/D48019
llvm-svn: 339126
Summary:
The accelerator tables use the debug_str section to store their strings.
However, they do not support the indirect method of access that is
available for the debug_info section (DW_FORM_strx et al.).
Currently our code is assuming that all strings can/will be referenced
indirectly, and puts all of them into the debug_str_offsets section.
This is generally true for regular (unsplit) dwarf, but in the DWO case,
most of the strings in the debug_str section will only be used from the
accelerator tables. Therefore the contents of the debug_str_offsets
section will be largely unused and bloating the main executable.
This patch rectifies this by teaching the DwarfStringPool to
differentiate between strings accessed directly and indirectly. When a
user inserts a string into the pool it has to declare whether that
string will be referenced directly or not. If at least one user requsts
indirect access, that string will be assigned an index ID and put into
debug_str_offsets table. Otherwise, the offset table is skipped.
This approach reduces the overall binary size (when compiled with
-gdwarf-5 -gsplit-dwarf) in my tests by about 2% (debug_str_offsets is
shrunk by 99%).
Reviewers: probinson, dblaikie, JDevlieghere
Subscribers: aprantl, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D49493
llvm-svn: 339122
This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators.
It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV.
Differential Revision: https://reviews.llvm.org/D49248
llvm-svn: 339121
I was trying to add a test case for LLD and found that it
is impossible to set sh_entsize via yaml.
The patch implements the missing part.
Differential revision: https://reviews.llvm.org/D50235
llvm-svn: 339113
Summary:
Wasm does not have direct counterparts to some of LLVM IR's atomicrmw
instructions (min, max, umin, umax, and nand). This enables atomic
expansion using cmpxchg instruction within a loop for those atomicrmw
instructions.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D49440
llvm-svn: 339084
Summary:
The issue with the python path is that the path to python on Windows can contain spaces. To make the tests always work, the path to python needs to be surrounded by quotes.
This change updates several configuration files which specify the path to python as a substitution and also remove quotes from existing tests.
Reviewers: asmith, zturner, alexshap, jakehehrlich
Reviewed By: zturner, alexshap, jakehehrlich
Subscribers: mehdi_amini, nemanjai, eraman, kbarton, jakehehrlich, steven_wu, dexonsmith, stella.stamenova, delcypher, llvm-commits
Differential Revision: https://reviews.llvm.org/D50206
llvm-svn: 339073
This matches our behaviour for regular (i.e. relocated) references to
private symbols and therefore avoids needing to unnecessarily write
address-significant .L symbols to the object file's symbol table,
which can interfere with stack traces.
Fixes check-cfi after r339050.
llvm-svn: 339066
Everything should quiet, and I think everything should
flush.
I assume the min3/med3/max3 follow the same rules
as regular min/max for flushing, which should at
least be conservatively correct.
There are still more operations that need to
be handled.
llvm-svn: 339065
Not sure why this was checking for denormals for f16.
My interpretation of the IEEE standard is conversions
should produce a canonical result, and the ISA manual
says denormals are created when appropriate.
llvm-svn: 339064
If denormals are enabled, denormals are canonical.
Also fix a few other issues. minnum/maxnum are supposed
to canonicalize. Temporarily improve workaround for the
instruction behavior change in gfx9.
Handle selects and fcopysign.
The tests were also largely broken, since they were
checking for a flush used on some targets after the
store of the result.
llvm-svn: 339061
This assert fires when attempting to extract a subregister from the
global PIC base register. This virtual register SD node is not in the
VRBaseMap, so we shouldn't call getVR to look it up there. If this is a
RegisterSDNode, we should be able to use the virtual register directly.
Fixes PR38385
llvm-svn: 339056
Properly shrink `pow()` to `powf()` as a binary function and, when no other
simplification applies, do not discard it.
Differential revision: https://reviews.llvm.org/D50113
llvm-svn: 339046
Summary:
Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB)
instructions in more cases. For example, the following sequence
a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)
generates an fsub and fma without this patch and an fnma with this
change.
Reviewers: craig.topper
Subscribers: llvm-commits, davidxl, wmi
Differential Revision: https://reviews.llvm.org/D48467
llvm-svn: 339043
If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source
Differential Revision: https://reviews.llvm.org/D50270
llvm-svn: 339041
for all the uses from the same def is done.
We run into a compile time problem with flex generated code combined with
`-fno-jump-tables`. The cause is that machineLICM hoists a lot of invariants
outside of a big loop, and drastically increases the compile time in global
register splitting and copy coalescing. https://reviews.llvm.org/D49353
relieves the problem in global splitting. This patch is to handle the problem
in copy coalescing.
About the situation where the problem in copy coalescing happens. After
machineLICM, we have several defs outside of a big loop with hundreds or
thousands of uses inside the loop. Rematerialization in copy coalescing
happens for each use and everytime rematerialization is done, shrinkToUses
will be called to update the huge live interval. Because we have 'n' uses
for a def, and each live interval update will have at least 'n' complexity,
the total update work is n^2.
To fix the problem, we try to do the live interval update work in a collective
way. If a def has many copylike uses larger than a threshold, each time
rematerialization is done for one of those uses, we won't do the live interval
update in time but delay that work until rematerialization for all those uses
are completed, so we only have to do the live interval update work once.
Delaying the live interval update could potentially change the copy coalescing
result, so we hope to limit that change to those defs with many
(like above a hundred) copylike uses, and the cutoff can be adjusted by the
option -mllvm -late-remat-update-threshold=xxx.
Differential Revision: https://reviews.llvm.org/D49519
llvm-svn: 339035
Summary:
Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the
same type. This fixes an assertion failure in VerifySDNode.
Reviewers: SjoerdMeijer, t.p.northover, javed.absar
Reviewed By: SjoerdMeijer
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D50202
llvm-svn: 339013
ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes
out that part of any addend in an object file. But it only does that for
symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting
that extra bit in other cases.
llvm-svn: 339007
The patch was reverted because of bug detected by sanitizer. The bug is fixed,
respective tests added.
Differential Revision: https://reviews.llvm.org/D50172
llvm-svn: 339005
Multiple failues reported by sanitizer-x86_64-linux, seem to be caused by this
patch. Reverting to see if they sustain without it.
Differential Revision: https://reviews.llvm.org/D50172
llvm-svn: 338994
`isKnownNonNullFromDominatingCondition` is able to prove non-null basing on `br` or `guard`
by `%p != null` condition, but is unable to do so basing on `(%p != null) && %other_cond`.
This patch allows it to do so.
Differential Revision: https://reviews.llvm.org/D50172
Reviewed By: reames
llvm-svn: 338990
If there is a frequently taken branch dominated by a guard, and its condition is available
at the point of the guard, we can widen guard with condition of this branch and convert
the branch into unconditional:
guard(cond1)
if (cond2) {
// taken in 99.9% cases
// do something
} else {
// do something else
}
Converts to
guard(cond1 && cond2)
// do something
Differential Revision: https://reviews.llvm.org/D49974
Reviewed By: reames
llvm-svn: 338988
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338969
There are a bunch of edge cases and inconsistencies in how we're emitting sections
cause this warning to fire and it needs more work.
This reverts commit r335558.
llvm-svn: 338968
https://rise4fun.com/Alive/IT3
Comes up in the [most ugliest] signed int -> signed char case of
-fsanitize=implicit-conversion (https://reviews.llvm.org/D50250)
Not sure if we want to do it always, or only when it is free to invert.
llvm-svn: 338967
Summary:
Previously, in the NewPM pipeline, TailCallElim recalculates the DomTree when it modifies any instruction in the Function.
For example,
```
CallInst *CI = dyn_cast<CallInst>(&I);
...
CI->setTailCall();
Modified = true;
...
if (!Modified || ...)
return PreservedAnalyses::all();
```
After applying this patch, the DomTree only recalculates if needed (plus an extra insertEdge() + an extra deleteEdge() call).
When optimizing SQLite with `-passes="default<O3>"` pipeline of the newPM, the number of DomTree recalculation decreases by 6.2%, the number of nodes visited by DFS decreases by 2.9%. The time used by DomTree will decrease approximately 1%~2.5% after applying the patch.
Statistics:
```
Before the patch:
23010 dom-tree-stats - Number of DomTree recalculations
489264 dom-tree-stats - Number of nodes visited by DFS -- DomTree
After the patch:
21581 dom-tree-stats - Number of DomTree recalculations
475088 dom-tree-stats - Number of nodes visited by DFS -- DomTree
```
Reviewers: kuhar, dmgreen, brzycki, grosser, davide
Reviewed By: kuhar, brzycki
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49982
llvm-svn: 338954
These tests were clearly auto-generated when they were converted to
FileCheck back in r80019 (2009), but we didn't have a fancy script to
keep them up to date then. I've reviewed the diff, and we should be
generating the exact same code sequences we used to.
After this, I plan to commit a change that changes our output slightly,
but in a way that is still correct. It will generate a large diff, and I
want it to be clearly correct, so I am regenerating these checks in
preparation for that.
llvm-svn: 338928
- Remove -asm-verbose=0 from every llc command. The tests still pass.
- Reorder the RUN lines to match CHECKs.
- Use -LABEL like update_llc_test_checks.py does.
llvm-svn: 338927
Put the LLVM IR at the bottom of the function instead of the top. In my
next patch, I will run update_llc_test_checks.py on this file, and I
want to only highlight the diffs in the CHECK lines. Hopefully by doing
this change first, the patch will be more understandable.
llvm-svn: 338926
At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering.
So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions.
Differential Revision: https://reviews.llvm.org/D50212
llvm-svn: 338925
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.
Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.
llvm-svn: 338910
Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag.
Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and.
I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful.
Differential Revision: https://reviews.llvm.org/D50165
llvm-svn: 338907
In r337830 I added SCEV checks to enable us to insert fewer bounds checks. Unfortunately, this sometimes crashes when multiple bounds checks are added due to SCEV caching issues. This patch splits the bounds checking pass into two phases, one that computes all the conditions (using SCEV checks) and the other that adds the new instructions.
Differential Revision: https://reviews.llvm.org/D49946
llvm-svn: 338902
We don't expect module names to be present in the index. This patch adds
DW_TAG_module to the blacklist.
Differential revision: https://reviews.llvm.org/D50237
llvm-svn: 338878
Some instructions expand to more than one decoder group.
This has been hitherto ignored, but is handled with this patch.
Review: Ulrich Weigand
https://reviews.llvm.org/D50187
llvm-svn: 338849
Basic pattern that leaves an unnecessary select on a rotation by zero result. This variant is trivial - the more general case with a compare+branch to prevent execution of undefined shifts is more tricky.
llvm-svn: 338833
There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function.
The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change.
llvm-svn: 338824
Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place.
To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64.
I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously.
llvm-svn: 338821
This is the second patch of the series which intends to enable jump threading for an inlined method whose return type is std::pair<int, bool> or std::pair<bool, int>.
The first patch is https://reviews.llvm.org/rL338485.
This patch handles code sequences that merges two values using `shl` and `or`, then extracts one value using `and`.
Differential Revision: https://reviews.llvm.org/D49981
llvm-svn: 338817
If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX.
llvm-svn: 338812
This one requires a bit of explaination. It's not every day you simply delete code to implement an optimization. :)
The transform in question is sinking an instruction from a loop to the uses in loop exiting blocks. We know (from LCSSA) that all of the uses outside the loop must be phi nodes, and after predecessor splitting, we know all phi users must have a single operand. Since the use must be strictly dominated by the def, we know from the definition of dominance/ssa that the exit block must execute along a (non-strict) subset of paths which reach the def. As a result, duplicating a potentially faulting instruction can not *introduce* a fault that didn't previously exist in the program.
The full story is that this patch builds on "rL338671: [LICM] Factor out fault legality from canHoistOrSinkInst [NFC]" which pulled this logic out of a common helper routine. As best I can tell, this check was originally added to the helper function for hoisting legality, later an incorrect fastpath for loads/calls was added, and then the bug was fixed by duplicating the fault safety check in the hoist path. This left the redundant check in the common code to pessimize sinking for no reason. I split it out in an NFC, and am not removing the unneccessary check. I wanted there to be something easy to revert in case I missed something.
Reviewed by: Anna Thomas (in person)
llvm-svn: 338794
Summary:
With Mach-O, there is a flag requirement discrepancy between working with
universal binaries and thin binaries. Many flags that don't require the `-macho`
flag (for example `-private-headers` and `-disassemble`) fail to work on
universal binaries unless `-macho` is given. When this happens, the error
message is unhelpful, stating:
The file was not recognized as a valid object file.
Which can lead to confusion.
This change allows generic flags to be used on universal binaries with and
without the `-macho` flag. This means flags that can be used for thin files can
be used consistently with fat files too.
To do this, the universal binary support within `ParseInputMachO()` is extracted
into a new function. This new function is called directly from `DumpInput()`
when the input binary is universal. Additionally the `-arch` flag validation in
`ParseInputMachO()` was extracted to be reused.
Reviewers: compnerd
Reviewed By: compnerd
Subscribers: keith, llvm-commits
Differential Revision: https://reviews.llvm.org/D48702
llvm-svn: 338792
At least on ELF, it's impossible to tell from the object file whether
two globals with the same section marking were merged: the merged global
uses "private" linkage to hide its symbol, and the aliases look like
regular symbols. I can't think of any other reason to disallow it.
(Of course, we can only merge globals in the same section.)
The weird alignment handling matches AsmPrinter; our alignment handling
for global variables should probably be refactored.
Differential Revision: https://reviews.llvm.org/D49822
llvm-svn: 338791
Summary:
I encountered some problems with SIFixWWMLiveness when WWM is in a loop:
1. It sometimes gave invalid MIR where there is some control flow path
to the new implicit use of a register on EXIT_WWM that does not pass
through any def.
2. There were lots of false positives of registers that needed to have
an implicit use added to EXIT_WWM.
3. Adding an implicit use to EXIT_WWM (and adding an implicit def just
before the WWM code, which I tried in order to fix (1)) caused lots
of the values to be spilled and reloaded unnecessarily.
This commit is a rework of SIFixWWMLiveness, with the following changes:
1. Instead of considering any register with a def that can reach the WWM
code and a def that can be reached from the WWM code, it now
considers three specific cases that need to be handled.
2. A register that needs liveness over WWM to be synthesized now has it
done by adding itself as an implicit use to defs other than the
dominant one.
Also added the following fixmes:
FIXME: We should detect whether a register in one of the above
categories is already live at the WWM code before deciding to add the
implicit uses to synthesize its liveness.
FIXME: I believe this whole scheme may be flawed due to the possibility
of the register allocator doing live interval splitting.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46756
Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94
llvm-svn: 338783
Summary:
This fixes a problem where a load from global+idx generated incorrect
code on <=gfx7 when the index is divergent.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47383
Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed
llvm-svn: 338779
This will remove suboptimal branching from the generated ll/sc loops.
The extra simplification pass affects a lot of testcases, which have
been modified to accommodate this change: either by modifying the
test to become immune to the CFG simplification, or (less preferablt)
by adding option -hexagon-initial-cfg-clenaup=0.
llvm-svn: 338774