Summary:
I hadn't realized that instrumentation runs before inlining, so we can't
use the function as the comdat group. Doing so can create relocations
against discarded sections when references to discarded __profc_
variables are inlined into functions outside the function's comdat
group.
In the future, perhaps we should consider standardizing the comdat group
names that ELF and COFF use. It will save object file size, since
__profv_$sym won't appear in the symbol table again.
Reviewers: xur, vsk
Subscribers: eraman, hiraditya, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58737
llvm-svn: 355044
Summary:
The original assumption for the insertDef method was that it would not
materialize Defs out of no-where, hence it will not insert phis needed
after inserting a Def.
However, when cloning an instruction (use case used in LICM), we do
materialize Defs "out of no-where". If the block receiving a Def has at
least one other Def, then no processing is needed. If the block just
received its first Def, we must check where Phi placement is needed.
The only new usage of insertDef is in LICM, hence the trigger for the bug.
But the original goal of the method also fails to apply for the move()
method. If we move a Def from the entry point of a diamond to either the
left or right blocks, then the merge block must add a phi.
While this usecase does not currently occur, or may be viewed as an
incorrect transformation, MSSA must behave corectly given the scenario.
Resolves PR40749 and PR40754.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58652
llvm-svn: 355040
This restores the patch that splits demangled stdin input on
non-alphanumerics. I had reverted this patch earlier because it broke
Windows build-bots. I have updated the test so that it passes on
Windows.
I was running the test from powershell and never saw the issue until I
switched to the mingw shell.
This reverts commit 628ab5c682.
llvm-svn: 355031
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).
This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.
To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..
Previously landed
D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
D57596 [CodeGen] Be conservative about atomic accesses as for volatile
D57802 Be conservative about unordered accesses for the moment
rL353959: [Tests] First batch of cornercase tests for unordered atomics.
rL353966: [Tests] RMW folding tests w/unordered atomic operations.
rL353972: [Tests] More unordered atomic lowering tests.
rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
rL354740: [Hexagon, SystemZ] Be super conservative about atomics
rL354800: [Lanai] Be super conservative about atomics
rL354845: [ARM] Be super conservative about atomics
Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)
Differential Revision: https://reviews.llvm.org/D57601
llvm-svn: 355025
This reverts commit 5cd5f8f256.
The test passes on linux, but fails on the windows build-bots.
This test failure seems to be a quoting issue between my test and
FileCheck on Windows. I'm reverting this patch until I can replicate
and fix in my Windows environment.
llvm-svn: 355021
A lot of the INSERT_SUBVECTOR combines can be more generally handled as if they have come from a CONCAT_VECTORS node.
I've been investigating adding a CONCAT_VECTORS combine to X86, but this is a much easier first step that avoids the issue of handling a number of pre-legalization issues that I've encountered.
Differential Revision: https://reviews.llvm.org/D58583
llvm-svn: 355015
Summary:
This patch displays a hexadecimal section value (Elf_Shdr::sh_type) or section-relative offset when printing unknown sections.
Here is a subset of the output (ignoring the fields following "Type" when dumping an ELF's GNU `--section-headers` table).
Section Headers:
```
[Nr] Name Type
[16] android_rel LOOS+0x1
[17] android_rela LOOS+0x2
[27] unknown 0x1000: <unknown>
[28] loos LOOS+0
[30] hios VERSYM
[31] loproc LOPROC+0
[33] hiproc LOPROC+0xFFFFFFF
[34] louser LOUSER+0
[36] hiuser LOUSER+0x7FFFFFFF
```
As a comparison, the previous output looked something like the above, but with a blank "Type" field:
```
[Nr] Name Type
[27] unknown
[28] loos
[30] hios VERSYM
[31] loproc
[33] hiproc
[34] louser
[36] hiuser
```
This fixes PR40773
Reviewers: jhenderson, rupprecht, Bigcheese
Reviewed By: jhenderson, rupprecht, Bigcheese
Subscribers: MaskRay, Bigcheese, srhines, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58701
llvm-svn: 355014
The original intent was to enable this test for Windows; however, that
initial patch broke one of the build-bots. I temporarily disabled this
test on Windows until that issue was resolved. It was resolved in my
previous patch (cfd1d9742ee2d1b8dd6b7), and now I am re-enabling this
test.
llvm-svn: 355011
Ideally this is a NFCI, used single quotes in most cases. Hopefully
this will make the Windows bot happy.
I've marked this unsupported on windows, until I get my windows box
setup with this patch to test. I'll remove that constraint after I'm
confident this will pass on windows. I just want to silence the
buildbots for now.
llvm-svn: 355007
This patch adds testing of areas of the code that are not fully tested,
in particular dynamic table printing, ELF type printing, handling of
edge cases where things are missing/empty (relocations/program header
tables/section header table), and the --string-dump switch.
Reviewed by: grimar, higuoxing, rupprecht
Differential Revision: https://reviews.llvm.org/D58677
llvm-svn: 355003
Summary:
Before:
```
Dynamic Section:
NEEDED libpthread.so.0
...
NEEDED ld-linux-x86-64.so.2
RPATH 0x00000000001c2e61
```
After:
```
Dynamic Section:
NEEDED libpthread.so.0
...
NEEDED ld-linux-x86-64.so.2
RPATH $ORIGIN/../lib
```
Only a small problem here, I have no idea on choosing test case. I see there's a test
file(test/tools/llvm-objdump/private-headers-dynamic-section.test). But it has no DT_RPATH and DT_RUNPATH tags. Shall I replace the ELF file in the
Inputs dir by a new one?
Reviewers: jhenderson, grimar
Reviewed By: jhenderson
Subscribers: srhines, rupprecht, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58707
llvm-svn: 355001
Summary:
This patch attempts to replicate GNU c++-filt behavior when splitting stdin input for demangling.
Previously, cxx-filt would split input only on spaces. Each delimited item is then demangled.
From what I have tested, GNU c++filt also splits input on any character that does not make
up the mangled name (notably commas, but also a large set of non-alphanumeric characters).
This patch splits stdin input on any character that does not belong to the Itanium mangling
format (since Itanium is currently the only supported format in llvm-cxxfilt).
This is an update to PR39990
Reviewers: jhenderson, tejohnson, compnerd
Reviewed By: compnerd
Subscribers: erik.pilkington, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58416
llvm-svn: 354998
When using full LTO it is possible that template function definition DIE
is bound to one compilation unit and it's declaration to another. We should
add function declaration attributes on behalf of its owner CU otherwise
we may end up with malformed file identifier in function declaration
DW_AT_decl_file attribute.
Differential revision: https://reviews.llvm.org/D58538
llvm-svn: 354978
That patch is the fix for https://bugs.llvm.org/show_bug.cgi?id=40703
"wrong line number info for obj file compiled with -ffunction-sections"
bug. The problem happened with only .o files. If object file contains
several .text sections then line number information showed incorrectly.
The reason for this is that DwarfLineTable could not detect section which
corresponds to specified address(because address is the local to the
section). And as the result it could not select proper sequence in the
line table. The fix is to pass SectionIndex with the address. So that it
would be possible to differentiate addresses from various sections. With
this fix llvm-objdump shows correct line numbers for disassembled code.
Differential review: https://reviews.llvm.org/D58194
llvm-svn: 354972
llvm-readobj's error messages were broken for bad archive members. This
patch fixes them, and also adds testing for archive and thin archive
handling within llvm-readobj.
Reviewed by: rupprecht, grimar, higuoxing
Differential Revision: https://reviews.llvm.org/D58681
llvm-svn: 354960
Currently, the LLVM will print an error like
Unsupported relocation: try to compile with -O2 or above,
or check your static variable usage
if user defines more than one static variables in a single
ELF section (e.g., .bss or .data).
There is ongoing effort to support static and global
variables in libbpf and kernel. This patch removed the
assertion so user programs with static variables won't
fail compilation.
The static variable in-section offset is written to
the "imm" field of the corresponding to-be-relocated
bpf instruction. Below is an example to show how the
application (e.g., libbpf) can relate variable to relocations.
-bash-4.4$ cat g1.c
static volatile long a = 2;
static volatile int b = 3;
int test() { return a + b; }
-bash-4.4$ clang -target bpf -O2 -c g1.c
-bash-4.4$ llvm-readelf -r g1.o
Relocation section '.rel.text' at offset 0x158 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name
0000000000000000 0000000400000001 R_BPF_64_64 0000000000000000 .data
0000000000000018 0000000400000001 R_BPF_64_64 0000000000000000 .data
-bash-4.4$ llvm-readelf -s g1.o
Symbol table '.symtab' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS g1.c
2: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 a
3: 0000000000000008 4 OBJECT LOCAL DEFAULT 4 b
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 64 FUNC GLOBAL DEFAULT 2 test
-bash-4.4$ llvm-objdump -d g1.o
g1.o: file format ELF64-BPF
Disassembly of section .text:
0000000000000000 test:
0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
2: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0)
3: 18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00 r2 = 8 ll
5: 61 20 00 00 00 00 00 00 r0 = *(u32 *)(r2 + 0)
6: 0f 10 00 00 00 00 00 00 r0 += r1
7: 95 00 00 00 00 00 00 00 exit
-bash-4.4$
. from symbol table, static variable "a" is in section #4, offset 0.
. from symbol table, static variable "b" is in section #4, offset 8.
. the first relocation is against symbol #4:
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
and in-section offset 0 (see llvm-objdump result)
. the second relocation is against symbol #4:
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
and in-section offset 8 (see llvm-objdump result)
. therefore, the first relocation is for variable "a", and
the second relocation is for variable "b".
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 354954
Some platforms, e.g. Windows, support backtraces but don't have
BACKTRACE. Checking for BACKTRACE prevents Windows from having
backtraces.
Patch by Jason Mittertreiner!
llvm-svn: 354951
Summary:
When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we
should create not only `end_try` -> `try` mapping but also `catch` ->
`try` mapping as well. If this is not created, `block` and `end_block`
markers later added may span across an existing `catch`, resulting in
the incorrect code like:
```
try
block --| (X)
catch |
end_block --|
end_try
```
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58605
llvm-svn: 354945
DWARFFormValues can be created from a data extractor or by passing its
value directly. Until now this was done by member functions that
modified an existing object's internal state. This patch replaces a
subset of these methods with static method that return a new
DWARFFormValue.
llvm-svn: 354941
Summary:
This removes unnecessary instructions after TRY marker placement. There
are two cases:
- `end`/`end_block` can be removed if they overlap with `try`/`end_try`
and they have the same return types.
- `br` right before `catch` that branches to after `end_try` can be
deleted.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58591
llvm-svn: 354939
Since there is no "Load-and-Test-High" instruction, the 32 bit load of a
register to be compared with 0 can only be implemented with LT if the virtual
GRX32 register ends up in a low part (GR32 register).
This patch detects these cases and passes the GR32 registers (low parts) as
(soft) hints in getRegAllocationHints().
Review: Ulrich Weigand.
llvm-svn: 354935
Splitting can make sanitizer errors harder to understand, as the
trapping instruction may not be in the function where the bug was
detected.
rdar://48142697
llvm-svn: 354931
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.
In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.
A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use
This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 354930
SITargetLowering::reassociateScalarOps() does not touch constants
so that DAGCombiner::ReassociateOps() does not revert the combine.
However a global address is not a ConstantSDNode.
Switched to the method used by DAGCombiner::ReassociateOps() itself
to detect constants.
Differential Revision: https://reviews.llvm.org/D58695
llvm-svn: 354926
Original implementation can't correctly handle __m256 and __m512 types
passed by reference through stack. This patch fixes it.
Patch by Wei Xiao!
Differential Revision: https://reviews.llvm.org/D57643
llvm-svn: 354921
Check that we do not crash if a parallelism group is explicitly set to
None. Permits usage of the following pattern.
[lit.common.cfg]
lit_config.parallelism_groups['my_group'] = None
if <condition>:
lit_config.parallelism_groups['my_group'] = 3
[project/lit.cfg]
config.parallelism_group = 'my_group'
Reviewers: rnk
Differential Revision: https://reviews.llvm.org/D58305
llvm-svn: 354912
Fix https://bugs.llvm.org/show_bug.cgi?id=38583: Describe
how memcpy/memmove/memset behave when len=0. Also fix
some fallout from when the alignment parameter was
replaced by an attribute.
This closes PR38583.
Patch by RalfJung (Ralf)
Differential Revision: https://reviews.llvm.org/D57600
llvm-svn: 354911
The previous sort comparator was not deterministic, i.e. in some
situations it would be possible for lhs < rhs && rhs < lhs. This was
discovered by an STL assertion in a Windows debug build of llvm-tblgen.
Differential Revision: https://reviews.llvm.org/D58687
llvm-svn: 354910
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html
We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).
That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.
The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.
llvm-svn: 354905
This patch enables the following
1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) Scheduler descriptions are yet to be put in place.
Reviewers: craig.topper
Differential Revision: https://reviews.llvm.org/D58343
llvm-svn: 354897
This patch aims to make sure that any such constant that can be generated
with a vector instruction (for example VGBM) is recognized as such during
legalization and kept as a target independent node through post-legalize
DAGCombining.
Two new functions named isVectorConstantLegal() and loadVectorConstant()
replace old ways of handling vector/FP constants.
A new struct named SystemZVectorConstantInfo is used to cache the results of
isVectorConstantLegal() and pass them onto loadVectorConstant().
Support for fp128 constants in the presence of FeatureVectorEnhancements1
(z14) has been added.
Review: Ulrich Weigand
https://reviews.llvm.org/D58270
llvm-svn: 354896
Rotate is a special-case of funnel shift that has different
poison constraints than the general case. That's not visible
yet in the existing tests, but it needs to be corrected.
llvm-svn: 354894
Dispatch stall cycles may be associated to multiple dispatch stall events.
Before this patch, each stall cycle was associated with a single stall event.
This patch also improves a couple of code comments, and adds a helper method to
query the Scheduler for dispatch stalls.
llvm-svn: 354877
This allows tools to parse/dump the architecture specific tags
like DT_MIPS_*, DT_PPC64_* and DT_HEXAGON_*
Also fixes a bug in DynamicTags.def which was revealed in this patch.
Differential revision: https://reviews.llvm.org/D58667
llvm-svn: 354876
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.
The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
which is the default behavior of llvm-objdump.
Differential Revision: https://reviews.llvm.org/D57680
llvm-svn: 354870
If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results.
I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit.
Differential Revision: https://reviews.llvm.org/D58637
llvm-svn: 354866
For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.
This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.
Differential Revision: https://reviews.llvm.org/D58656
llvm-svn: 354846
As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Differential Revision: https://reviews.llvm.org/D58490
Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk.
llvm-svn: 354845
Summary: We shouldn't delete elements while iterating a ranged for loop.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58519
llvm-svn: 354844
Summary:
- Indent check lines to easily figure out try-catch-end structure
- Add the original C++ code the tests were genereated from
- Add a few more lines to make the structure more readable
- Rename a couple function / structures
- Add label and branch annotations to cfg-stackify-eh.ll
- Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll`
because it will be updated in a later CL soon and there's no point of
making it look better here
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58562
llvm-svn: 354842
Summary:
The llvm-cov tool needs to be able to find coverage names in the
executable, so the .lprfn and .lcovmap sections cannot be merged into
.rdata.
Also, the linker merges .lprfn$M into .lprfn, so llvm-cov needs to
handle that when looking up sections. It has to support running on both
relocatable object files and linked PE files.
Lastly, when loading .lprfn from a PE file, llvm-cov needs to skip the
leading zero byte added by the profile runtime.
Reviewers: vsk
Subscribers: hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58661
llvm-svn: 354840
Summary:
Use a custom calling convention handler for interrupts instead of fixing
up the locations in LowerMemArgument. This way, the offsets are correct
when constructed and we don't need to account for them in as many
places.
Depends on D56883
Replaces D56275
Reviewers: craig.topper, phil-opp
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D56944
llvm-svn: 354837
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.
llvm-svn: 354828
We have all the necessary legalization, expansion and unrolling support required for the *.overflow intrinsics with vector types, so update the docs to make that clear.
Note: vectorization is not in place yet (the non-homogenous return types aren't well supported) so we still must explicitly use the vectors intrinsics and not reply on slp/loop.
Differential Revision: https://reviews.llvm.org/D58618
llvm-svn: 354821
Summary:
In D58580 i have noted that `llvm::to_string()` is a memory hog.
It uses `raw_string_ostream`, and since it was buffered,
every `raw_string_ostream` had a cost of `BUFSIZ` bytes
(which is `8192` at least here). So every `llvm::to_string()`
call, even to just print an `int`, costed `8192` bytes.
In D58580, getting rid of that buffering //had// significant
performance and memory consumption improvements for `llvm-xray convert`.
Similarly, in D58580 @rnk pointed out that the `raw_svector_ostream`
is already unbuffered, and `write_unsigned_impl` and friends
do internal buffering. So it should be ok performance-wise to just
make the `raw_string_ostream` itself unbuffered.
Here, i don't have any perf measurements.
Another letdown is that i'm leaving a loose-end - not deleting the
`flush()` method. I don't expect that cleanup to be anything more
than just fixing every new compiler error, but i'm presently unable
to do that. Will look into that later.
Reviewers: rnk, zturner
Reviewed By: rnk
Subscribers: kristina, jdoerfert, llvm-commits, rnk
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58643
llvm-svn: 354819
If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case.
Differential Revision: https://reviews.llvm.org/D58475
llvm-svn: 354811
Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending
(and x, 1) the condition before branching on it.
To avoid regressing trivial cases, I'm combining emission of cmp+br
sequences for the single-use + same block case (similar to what we
do in x86). icmpbr1.ll still regresses due to the cross-bb usage
of the condition.
Differential Revision: https://reviews.llvm.org/D58576
llvm-svn: 354808
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.
Differential Revision: https://reviews.llvm.org/D58528
llvm-svn: 354807
As requested during review of D57601 <https://reviews.llvm.org/D57601>, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.
llvm-svn: 354800
These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well.
We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that.
This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D58503
llvm-svn: 354797
Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold
Differential Revision: https://reviews.llvm.org/D58585
llvm-svn: 354793
This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.
Differential Revision: https://reviews.llvm.org/D58281
llvm-svn: 354791
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.
Differential Revision: https://reviews.llvm.org/D58616
llvm-svn: 354790
Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline.
Differential Revision: https://reviews.llvm.org/D57925
llvm-svn: 354774
Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results.
Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB.
Differential Revision: https://reviews.llvm.org/D58597
llvm-svn: 354771
Recently, support was added to yaml2obj to allow dynamic sections to
have a list of entries, to make it easier to write tests with dynamic
sections. However, this change also removed the ability to provide
custom contents to the dynamic section, making it hard to test
malformed contents (e.g. because the section is not a valid size to
contain an array of entries). This change reinstates this. An error is
emitted if raw content and dynamic entries are both specified.
Reviewed by: grimar, ruiu
Differential Review: https://reviews.llvm.org/D58543
llvm-svn: 354770
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.
In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.
(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)
So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.
I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.
Reviewers: SjoerdMeijer, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57823
llvm-svn: 354768
Summary:
This reverts D50129 / rL338834: [XRay][tools] Use Support/JSON.h in llvm-xray convert
Abstractions are great.
Readable code is great.
JSON support library is a *good* idea.
However unfortunately, there is an internal detail that one needs
to be aware of in `llvm::json::Object` - it uses `llvm::DenseMap`.
So for **every** `llvm::json::Object`, even if you only store a single `int`
entry there, you pay the whole price of `llvm::DenseMap`.
Unfortunately, it matters for `llvm-xray`.
I was trying to analyse the `llvm-exegesis` analysis mode performance,
and for that i wanted to view the LLVM X-Ray log visualization in Chrome
trace viewer. And the `llvm-xray convert` is sluggish, and sometimes
even ended up being killed by OOM.
`xray-log.llvm-exegesis.lwZ0sT` was acquired from `llvm-exegesis`
(compiled with ` -fxray-instruction-threshold=128`)
analysis mode over `-benchmarks-file` with 10099 points (one full
latency measurement set), with normal runtime of 0.387s.
Timings:
Old: (copied from D58580)
```
$ perf stat -r 5 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (5 runs):
21346.24 msec task-clock # 1.000 CPUs utilized ( +- 0.28% )
314 context-switches # 14.701 M/sec ( +- 59.13% )
1 cpu-migrations # 0.037 M/sec ( +-100.00% )
2181354 page-faults # 102191.251 M/sec ( +- 0.02% )
85477442102 cycles # 4004415.019 GHz ( +- 0.28% ) (83.33%)
14526427066 stalled-cycles-frontend # 16.99% frontend cycles idle ( +- 0.70% ) (83.33%)
32371533721 stalled-cycles-backend # 37.87% backend cycles idle ( +- 0.27% ) (33.34%)
67896890228 instructions # 0.79 insn per cycle
# 0.48 stalled cycles per insn ( +- 0.03% ) (50.00%)
14592654840 branches # 683631198.653 M/sec ( +- 0.02% ) (66.67%)
212207534 branch-misses # 1.45% of all branches ( +- 0.94% ) (83.34%)
21.3502 +- 0.0585 seconds time elapsed ( +- 0.27% )
```
New:
```
$ perf stat -r 9 ./bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT
Performance counter stats for './bin/llvm-xray convert -sort -symbolize -instr_map=./bin/llvm-exegesis -output-format=trace_event -output=/tmp/trace.yml xray-log.llvm-exegesis.lwZ0sT' (9 runs):
7178.38 msec task-clock # 1.000 CPUs utilized ( +- 0.26% )
182 context-switches # 25.402 M/sec ( +- 28.84% )
0 cpu-migrations # 0.046 M/sec ( +- 70.71% )
33701 page-faults # 4694.994 M/sec ( +- 0.88% )
28761053971 cycles # 4006833.933 GHz ( +- 0.26% ) (83.32%)
2028297997 stalled-cycles-frontend # 7.05% frontend cycles idle ( +- 1.61% ) (83.32%)
10773154901 stalled-cycles-backend # 37.46% backend cycles idle ( +- 0.38% ) (33.36%)
36199132874 instructions # 1.26 insn per cycle
# 0.30 stalled cycles per insn ( +- 0.03% ) (50.02%)
6434504227 branches # 896420204.421 M/sec ( +- 0.03% ) (66.68%)
73355176 branch-misses # 1.14% of all branches ( +- 1.46% ) (83.33%)
7.1807 +- 0.0190 seconds time elapsed ( +- 0.26% )
```
So using `llvm::json` nearly triples run-time on that test case.
(+3x is times, not percent.)
Memory:
Old:
```
total runtime: 39.88s.
bytes allocated in total (ignoring deallocations): 79.07GB (1.98GB/s)
calls to allocation functions: 33267816 (834135/s)
temporary memory allocations: 5832298 (146235/s)
peak heap memory consumption: 9.21GB
peak RSS (including heaptrack overhead): 147.98GB
total memory leaked: 1.09MB
```
New:
```
total runtime: 17.42s.
bytes allocated in total (ignoring deallocations): 5.12GB (293.86MB/s)
calls to allocation functions: 21382982 (1227284/s)
temporary memory allocations: 232858 (13364/s)
peak heap memory consumption: 350.69MB
peak RSS (including heaptrack overhead): 2.55GB
total memory leaked: 79.95KB
```
Diff:
```
total runtime: -22.46s.
bytes allocated in total (ignoring deallocations): -73.95GB (3.29GB/s)
calls to allocation functions: -11884834 (529155/s)
temporary memory allocations: -5599440 (249307/s)
peak heap memory consumption: -8.86GB
peak RSS (including heaptrack overhead): 0B
total memory leaked: -1.01MB
```
So using `llvm::json` increases *peak* memory consumption on *this* testcase ~+27x.
And total allocation count +15x. Both of these numbers are times, *not* percent.
And note that memory usage is clearly unbound with `llvm::json`, it directly depends
on the length of the log, so peak memory consumption is always increasing.
This isn't so with the dumb code, there is no accumulating memory consumption,
peak memory consumption is fixed. Naturally, that means it will handle *much*
larger logs without OOM'ing.
Readability is good, but the price is simply unacceptable here.
Too bad none of this analysis was done as part of the development/review D50129 itself.
Reviewers: dberris, kpw, sammccall
Reviewed By: dberris
Subscribers: riccibruno, hans, courbet, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58584
llvm-svn: 354764
OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.
This saves ~3000 bytes in the X86 table.
llvm-svn: 354763
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D58430
llvm-svn: 354762
And regenerate checks. I had to rename some variables, because
update_test_checks can't deal with the same variable names used
in lower and upper case. I've also dropped the result type aliases,
as just using the type directly gives a cleaner result.
llvm-svn: 354759
Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.
llvm-svn: 354757
Summary:
Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58574
llvm-svn: 354755
Summary:
The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.
The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.
Reviewers: RKSimon, andreadb
Reviewed By: RKSimon
Subscribers: gbedwell, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58581
llvm-svn: 354754
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.
We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.
Reviewers: spatel, RKSimon, nikic
Reviewed By: nikic
Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58567
llvm-svn: 354753
add A, sext(B) --> sub A, zext(B)
We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.
The backend should be prepared for this change after:
D57401
rL353433
This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.
The seeming regression in the fuzzer test was discussed in:
D58359
We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.
llvm-svn: 354748
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.
Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.
The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.
This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486https://bugs.llvm.org/show_bug.cgi?id=31754
llvm-svn: 354746
As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.
llvm-svn: 354740
Thread Twine a little deeper through the VFS to avoid unnecessarily
constructing the same std::string twice in a parameter sequence:
Twine -> std::string -> StringRef -> std::string
Changing a few parameters from StringRef to Twine avoids the early call
to `Twine::str()`.
llvm-svn: 354739
The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.
X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.
Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.
llvm-svn: 354738
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."
These were reverted in r354713 as their context depended on other patches that were reverted for a bug.
llvm-svn: 354734
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.
I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.
Differential Revision: https://reviews.llvm.org/D58575
llvm-svn: 354733
Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD.
llvm-svn: 354729
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.
This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.
The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.
llvm-svn: 354724
Summary:
The objdump Mach-O parser uses MachOObjectFile::checkSymbolTable() to
verify the symbol table is in a legal state before dereferencing the
offsets in the table. This routine missed a test for N_STAB symbols
when validating the two-level name space library ordinal for undefined
symbols. If the binary in question contained a value in the n_desc high
byte that is larger than the list of loaded dylibs, checkSymbolTable()
will flag the library ordinal as being out of range. Most of the time
the n_desc field is set to 0 or to small values, but old final linked
binaries exist with N_STAB symbols bearing non-trivial n_desc fields.
The change here is simply to verify a symbol is not an N_STAB symbol
before consulting the values of n_other or n_desc.
rdar://44977336
Reviewers: lhames, pete, ab
Reviewed By: pete
Subscribers: llvm-commits, rupprecht
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58568
llvm-svn: 354722
r354363 caused https://crbug.com/934963#c1, which has a plain C reduced
test case.
I also had to revert some dependent changes:
- r354648
- r354647
- r354640
- r354511
llvm-svn: 354713
If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register.
Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization.
llvm-svn: 354709
If v8i64 isn't a legal type but v4i64 is, these will be split and then each half will get their input promoted and become an any_extend_vector_inreg/punpckhwd + any_extend + and/sign_extend_inreg.
If we instead recognize the input will be promoted we can emit the and/sign_extend_inreg first in a 128 bit register. Then we can sign_extend/zero_extend one half and pshufd+sign_extend/zero_extend the other half.
llvm-svn: 354708
We record the type of the symbol (event/function/data/global) in the
MCWasmSymbol and so it should always be clear how to handle a relocation
based on the symbol itself.
The exception is a function which still needs the special @TYPEINDEX
then the relocation contains the signature rather than the address
of the functions.
Differential Revision: https://reviews.llvm.org/D58472
llvm-svn: 354697
When LLVM_ENABLE_PROJECTS is set, CMake assumes the project
directories are all side-by-side. This is not always the case and
there's no reason to expect it if LLVM_EXTERNAL_<proj>_SOURCE_DIR is
set. Honor that setting if it exists and allow the build configuration
to continue.
Differential Revision: https://reviews.llvm.org/D49672
llvm-svn: 354693
Summary:
Prior to r310876 one of our out-of-tree targets was enabling IPRA by modifying
the TargetOptions::EnableIPRA. This no longer works on current trunk since the
useIPRA() hook overrides any values that are set in advance. This patch adjusts
the behaviour of the hook so that API users and useIPRA() can both enable it
but useIPRA() cannot disable it if the API user already enabled it.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: wdng, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D38043
llvm-svn: 354692
We need to enhance the uaddo matching to handle special-cases
as seen in PR40486 and PR31754. That means we won't necessarily
have a def-use pattern, so we'll need to check dominance to
determine where to place the intrinsic (as we already do for
usubo). This preliminary patch is just rearranging the code,
so the planned follow-up to improve uaddo will be more clear.
llvm-svn: 354689
Don't skip incrementing the frame index number
if the object is dead. Instructions can still be
referencing the old frame index number, and this
doesn't attempt to remap those. The resulting
MIR then fails to load because the use instructions
use a higher frame index number than recorded
list of stack objects.
I'm not sure it's possible to craft a testcase
with the existing set of passes. It requires
selectively marking some stack objects
dead in an essentially random order.
StackSlotColoring condenses towards
the low indexes. This avoids a regression in a
future AMDGPU commit when some frame indexes
are lowered separately from PEI.
llvm-svn: 354688
Will allow re-using the machinery for independent
sets of register allocators.
This will allow AMDGPU to use separate command line
options for the allocator to use for SGPRs separate
from VGPRs.
llvm-svn: 354687
This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect.
Differential Revision: https://reviews.llvm.org/D58393
llvm-svn: 354682
As discussed in:
D56864
D58197
Always use the narrow (128-bit) instruction when possible.
We already had the signed int version of this transform.
llvm-svn: 354675
Only the 1st fold is attempted pre-legalization, but it requires
legal (simple) types too, so we don't need an EVT in any of the code.
llvm-svn: 354674
Filling a delay slot in 32bit jump instructions with a 16bit instruction
can cause issues. According to the documentation such an operation is
unpredictable.
This patch adds opcode Mips::PseudoIndirectBranch_MM alongside
Mips::PseudoIndirectBranch and other instructions that are expanded to jr
instruction and do not allow a 16bit instruction in their delay slots.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D58507
llvm-svn: 354672
This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D58096
llvm-svn: 354670
Summary:
This patch separates two semantics of `applyUpdates`:
1. User provides an accurate CFG diff and the dominator tree is updated according to the difference of `the number of edge insertions` and `the number of edge deletions` to infer the status of an edge before and after the update.
2. User provides a sequence of hints. Updates mentioned in this sequence might never happened and even duplicated.
Logic changes:
Previously, removing invalid updates is considered a side-effect of deduplication and is not guaranteed to be reliable. To handle the second semantic, `applyUpdates` does validity checking before deduplication, which can cause updates that have already been applied to be submitted again. Then, different calls to `applyUpdates` might cause unintended consequences, for example,
```
DTU(Lazy) and Edge A->B exists.
1. DTU.applyUpdates({{Delete, A, B}, {Insert, A, B}}) // User expects these 2 updates result in a no-op, but {Insert, A, B} is queued
2. Remove A->B
3. DTU.applyUpdates({{Delete, A, B}}) // DTU cancels this update with {Insert, A, B} mentioned above together (Unintended)
```
But by restricting the precondition that updates of an edge need to be strictly ordered as how CFG changes were made, we can infer the initial status of this edge to resolve this issue.
Interface changes:
The second semantic of `applyUpdates` is separated to `applyUpdatesPermissive`.
These changes enable DTU(Lazy) to use the first semantic if needed, which is quite useful in `transforms/utils`.
Reviewers: kuhar, brzycki, dmgreen, grosser
Reviewed By: brzycki
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58170
llvm-svn: 354669
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.
Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri
instruction can be used with frame indices. In thumb1 we use tADDframe.
Differential Revision: https://reviews.llvm.org/D57833
llvm-svn: 354667
This is exactly the same as arm mode, so for the instruction selector
tests we just extract them to a new file and run with the same checks
for both arm and thumb mode.
For the legalizer we need to update the tests for soft float a bit, but
only because BL and tBL are slightly different. We could be pedantic and
check that we get a well-formed BL for arm mode and a tBL for thumb, but
for the purposes of the legalizer test it's sufficient to just skip over
the predicate operands in the checks. Also note that we have the
pedantic checks in the divmod test, so we're covered.
llvm-svn: 354665
This fixes https://bugs.llvm.org/show_bug.cgi?id=40786
("obj2yaml symbol output missing section index for SHN_ABS and SHN_COMMON symbols")
Since SHN_ABS and SHN_COMMON symbols are special, we should preserve
the st_shndx for them. The patch does this for them and the other special symbols.
The test case is based on the test provided by James Henderson at the bug page!
Differential revision: https://reviews.llvm.org/D58498
llvm-svn: 354661
The correct edge being deleted is not to the unswitched exit block, but to the
original block before it was split. That's the key in the map, not the
value.
The insert is correct. The new edge is to the .split block.
The splitting turns OriginalBB into:
OriginalBB -> OriginalBB.split.
Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete
ParentBB->OriginalBB, not ParentBB->OriginalBB.split.
llvm-svn: 354656
When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits.
We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results.
So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR.
llvm-svn: 354655
If the scalar type doesn't divide evenly into the WideVT then the code will need to take some bits from adjacent scalar loads and combine them.
But most of our testing is for i1 element type which always divides evenly.
llvm-svn: 354653
Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those.
But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results.
By promoting the input at the same time we can create only a single any_extend or truncate.
There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch.
llvm-svn: 354647