In the past while, I've committed a number of patches in the PowerPC back end
aimed at eliminating comparison instructions. However, this causes some failures
in proprietary source and these issues are not observed in SPEC or any open
source packages I've been able to run.
As a result, I'm pulling the entire series and will refactor it to:
- Have a single entry point for easy control
- Have fine-grained control over which patterns we transform
A side-effect of this is that test cases for these patches (and modified by
them) are XFAIL-ed. This is a temporary measure as it is counter-productive
to remove/modify these test cases and then have to modify them again when
the refactored patch is recommitted.
The failure will be investigated in parallel to the refactoring effort and
the recommit will either have a fix for it or will leave this transformation
off by default until the problem is resolved.
llvm-svn: 314244
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.
The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
Reviewers:
zvi
guyblank
dorit
Ayal
Differential Revision: https://reviews.llvm.org/D37117
Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
If this transformation succeeds, we're going to remove our dependency on the shift by rewriting the and. So it doesn't matter how many uses the shift has.
This distributes the one use check to other transforms in foldICmpAndConstConst that do need it.
Differential Revision: https://reviews.llvm.org/D38206
llvm-svn: 314233
When dsymutil generates the companion file, its strips all unnecessary
sections by omitting their body and setting the offset in their
corresponding load command to zero.
One such section is the .eh_frame section, as it contains runtime
information rather than debug information and is part of the __TEXT
segment. When reading this section, we would just read the number of
bytes specified in the load command, starting from offset 0 (i.e. the
beginning of the file).
Rather than trying to parse this obviously invalid section, dwarfdump
now skips this.
Differential revision: https://reviews.llvm.org/D38135
llvm-svn: 314208
This is a 2nd attempt at:
https://reviews.llvm.org/rL310055
...which was reverted at rL310123 because of PR34074:
https://bugs.llvm.org/show_bug.cgi?id=34074
In this version, we break out of the inner loop after we successfully merge and kill a pair of stores. In the
earlier rev, we were continuing instead, which meant we could process the invalid info from a now dead store.
Original commit message (authored by Filipe Cabecinhas):
This fixes PR31777.
If both stores' values are ConstantInt, we merge the two stores
(shifting the smaller store appropriately) and replace the earlier (and
larger) store with an updated constant.
In the future we should also support vectors of integers. And maybe
float/double if we can.
Differential Revision: https://reviews.llvm.org/D30703
llvm-svn: 314206
This patch adds logic to follow a symbol's aliases when the symbol name
cannot be found in the current object file. It checks the main binary
for the symbol's address and queries the current object for its aliases
(symbols with the same address) before printing out a warning.
Differential revision: https://reviews.llvm.org/D38230
llvm-svn: 314198
Removing X86 broadcast(f/i)32x2 intrinsics from llvm.
Adding autoUpgrade support.
Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll.
Differential Revision: https://reviews.llvm.org/D38220
llvm-svn: 314195
Usually the frontend communicates the size of wchar_t via metadata and
we can optimize wcslen (and possibly other calls in the future). In
cases without the wchar_size metadata we would previously try to guess
the correct size based on the target triple; however this is fragile to
keep up to date and may miss users manually changing the size via flags.
Better be safe and stop guessing and optimizing if the frontend didn't
communicate the size.
Differential Revision: https://reviews.llvm.org/D38106
llvm-svn: 314185
llvm-cov's report mode does not print any output when -show-functions is
specified and no source files are specified. This can be surprising, so
the tool should at least print out an error message when this happens.
rdar://problem/34636859
llvm-svn: 314175
It leads to some improvements, but also a regression for the simple
case, so it's not clearly a good idea.
test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference.
Ultimately, the right solution is probably to custom-lower fp-to-int
conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast.
It's hard to do the right thing when the implicit bitcast isn't visible
to DAG transforms.
llvm-svn: 314169
R12 is used for the SwiftError parameter. It is no longer a CSR as it
is used for transfer the SwiftError, and the caller must preserve it if
they need to.
llvm-svn: 314165
As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0.
I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation.
Differential Revision: https://reviews.llvm.org/D38001
llvm-svn: 314152
The 1st attempt at this:
https://reviews.llvm.org/rL314117
was reverted at:
https://reviews.llvm.org/rL314118
because of bot fails for clang tests that were checking optimized IR. That should be fixed with:
https://reviews.llvm.org/rL314144
...so try again.
Original commit message:
The transform to convert an extract-of-a-select-of-vectors was added at:
https://reviews.llvm.org/rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314147
This teach simplifyDemandedBits to handle constant splat vector shifts.
This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.
I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.
I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.
Differential Revision: https://reviews.llvm.org/D37665
llvm-svn: 314139
Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior
This is a follow up of D34598
Differential Revision: https://reviews.llvm.org/D38072
llvm-svn: 314125
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314117
Summary:
This code iterates the 'Orders' vector in parallel with the DbgValue
list, emitting all DBG_VALUEs that occurred between the last IR order
insertion point and the next insertion point. This assumes the
SDDbgValue list is sorted in IR order, which it usually is. However, it
is not sorted when a node with a debug value is replaced with another
one. When this happens, TransferDbgValues is called, and the new value
is added to the end of the list.
The problem can be solved by stably sorting the list by IR order.
Reviewers: aprantl, Ka-Ka
Reviewed By: aprantl
Subscribers: MatzeB, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D38197
llvm-svn: 314114
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:
c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal
Differential Revision: https://reviews.llvm.org/D36058
Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314106
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.
llvm-svn: 314083
This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type.
Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases.
Differential Revision: https://reviews.llvm.org/D35320
llvm-svn: 314076
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314073
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.
llvm-svn: 314072
The code wasn't yelling at the user when there's a reference
from a DIGlobalVariableExpression. Thanks to Adrian for the
reduced testcase. Fixes PR34672.
llvm-svn: 314069
This is a follow-up from D38181 (r314023). We have to put 64-bit
constants into a register using a separate instruction, so we
should try harder to avoid that.
From what I see, we're not likely to encounter this pattern in the
DAG because the upstream setcc combines from this don't (usually?)
produce this pattern. If we fix that, then this will become more
relevant. Since the cost of handling this case is just loosening
the predicate of the existing fold, we might as well do it now.
llvm-svn: 314064
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314062
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314060
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314055
Fixed suboptimal encoding of instruction memory operand when assembler is used to select 32 bit fixup rather than 8 bit immediate for encoding memory offset value.
Differential Revision: https://reviews.llvm.org/D38117
llvm-svn: 314044
The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81.
There are also better improvements based on known-bits that allow us to eliminate the mask entirely.
As noted, this could be extended. There are potentially other wins from always shifting first, but doing
that reveals a tangle of problems in other pattern matching. We do this transform generically in
instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this
in the backend.
Differential Revision: https://reviews.llvm.org/D38181
llvm-svn: 314023
Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.
Reviewers: dberris, echristo
Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D38102
llvm-svn: 314005
Usually an intrinsic is a simple target instruction, it should have a small latency. A real function call has much larger latency. So handle the intrinsic call in function getInstructionLatency().
Differential Revision: https://reviews.llvm.org/D38104
llvm-svn: 314003
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Patch by Suyog Sarda!
llvm-svn: 313993
Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64].
One example of where it is useful is:
before (20 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ax
mov $0xffff,%cx
cmovne %ax,%cx
movzwl %cx,%eax
retq
after (18 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ecx
mov $0xffff,%eax
cmovne %ecx,%eax
retq
Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36711
llvm-svn: 313982
We've found a serious issue with the current implementation of loop predication.
The current implementation relies on SCEV and this turned out to be problematic.
To fix the problem we had to rework the pass substantially. We have had the
reworked implementation in our downstream tree for a while. This is the initial
patch of the series of changes to upstream the new implementation.
For now the transformation is limited to the following case:
* The loop has a single latch with either ult or slt icmp condition.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is ult.
See the review or the LoopPredication.cpp header for the details about the
problem and the new implementation.
Reviewed By: sanjoy, mkazantsev
Differential Revision: https://reviews.llvm.org/D37569
llvm-svn: 313981
This patch re-commits the patch that was pulled out due to a
problem it caused, but with a fix for the problem. The fix
was reviewed separately by Eric Christopher and Hal Finkel.
Differential Revision: https://reviews.llvm.org/D38054
llvm-svn: 313978
For the following function:
double fn1(double d0, double d1, double d2) {
double a = -d0 - d1 * d2;
return a;
}
on ARM, LLVM generates code along the lines of
vneg.f64 d0, d0
vmls.f64 d0, d1, d2
i.e., a negate and a multiply-subtract.
The attached patch adds instruction selection patterns to allow it to generate the single instruction
vnmla.f64 d0, d1, d2
(multiply-add with negation) instead, like GCC does.
Committed on behalf of @gergo- (Gergö Barany)
Differential Revision: https://reviews.llvm.org/D35911
llvm-svn: 313972
Summary: Previously we would dereference Symtab without checking for null.
Reviewers: davide, atanasyan, rafael
Reviewed By: davide, atanasyan
Differential Revision: https://reviews.llvm.org/D38080
llvm-svn: 313970
This patch adds the -o and --out-file options for compatibility with
Darwin's dwarfdump.
Differential revision: https://reviews.llvm.org/D38125
llvm-svn: 313969
This patch adds instruction patterns for operations in BPF_ALU. After this,
assembler could recognize some 32-bit ALU statement. For example, those listed
int the unit test file.
Separate MOV patterns are unnecessary as MOV is ALU operation that could reuse
ALU encoding infrastructure, this patch removed those redundant patterns.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 313961
The previous SwiftCC support for AAPCS64 was partially correct. It
setup swiftself parameters in the proper register but failed to setup
swifterror in the correct register. This would break compilation of
swift code for non-Darwin AAPCS64 conforming environments.
llvm-svn: 313956
There were two issues, one Python 3 specific related to Unicode,
and another which is that the tool substitution for lld no longer
rejected matches where a / preceded the tool name.
llvm-svn: 313928
in the second slice of a Mach-O universal file.
The code in llvm-objdump in in DisassembleMachO() was getting the default
CPU then incorrectly setting into the global variable used for the -mcpu option
if that was not set. This caused a second call to DisassembleMachO() to use
the wrong default CPU when disassembling the next slice in a Mach-O universal
file. And would result in bad disassembly and an error message about an
recognized processor for the target:
% llvm-objdump -d -m -arch all fat.macho-armv7s-arm64
fat.macho-armv7s-arm64 (architecture armv7s):
(__TEXT,__text) section
armv7:
0: 60 47 bx r12
fat.macho-armv7s-arm64 (architecture arm64):
'cortex-a7' is not a recognized processor for this target (ignoring processor)
'cortex-a7' is not a recognized processor for this target (ignoring processor)
(__TEXT,__text) section
___multc3:
0: .long 0x1e620810
rdar://34439149
llvm-svn: 313921
debuginfo-tests has need to reuse a lot of common configuration
from clang and lld, and in general it seems like all of the
projects which are tightly coupled (e.g. lld, clang, llvm, lldb,
etc) can benefit from knowing about one other. For example,
lldb needs to know various things about how to run clang in its
test suite. Since there's a lot of common substitutions and
operations that need to be shared among projects, sinking this
up into LLVM makes sense.
In addition, this patch introduces a function add_tool_substitution
which handles all the dirty intricacies of matching tool names
which was previously copied around the various config files. This
is now a simple straightforward interface which is hard to mess
up.
Differential Revision: https://reviews.llvm.org/D37944
llvm-svn: 313919
Summary:
Avoid using XZR/WZR directly as operands to split stores of zero
vectors. Doing so can lead to the XZR/WZR being used by an instruction
that doesn't allow it (e.g. add).
Fixes bug 34674.
Reviewers: t.p.northover, efriedma, MatzeB
Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38146
llvm-svn: 313916
This patch adds dumping of line table instructions as well as the final
state at each specified pc value in verbose mode. This is essentially
the same as the default in Darwin's dwarfdump. Dumping the actual line
table opcodes can be particularly useful for something like debugging a
bad `.debug_line` section.
Differential revision: https://reviews.llvm.org/D37971
llvm-svn: 313910
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+ if (DII == InsertBefore)
+ InsertBefore = &*std::next(InsertBefore->getIterator());
DII->eraseFromParent();
I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.
The reduced C test case for this was:
void useit(int*);
static inline void inlineme() {
int x[2];
useit(x);
}
void f() {
inlineme();
inlineme();
}
llvm-svn: 313905
Summary:
SelectionDAGISel::LowerArguments is associating arguments
with frame indices (FuncInfo->setArgumentFrameIndex). That
information is later on used by EmitFuncArgumentDbgValue to
create DBG_VALUE instructions that denotes that a variable
can be found on the stack.
I discovered that for our (big endian) out-of-tree target
the association created by SelectionDAGISel::LowerArguments
sometimes is wrong. I've seen this happen when a 64-bit value
is passed on the stack. The argument will occupy two stack
slots (frame index X, and frame index X+1). The fault is
that a call to setArgumentFrameIndex is associating the
64-bit argument with frame index X+1. The effect is that the
debug information (DBG_VALUE) will point at the least significant
part of the arguement on the stack. When printing the
argument in a debugger I will get the wrong value.
I managed to create a test case for PowerPC that seems to
show the same kind of problem.
The bugfix will look at the datalayout, taking endianness into
account when examining a BUILD_PAIR node, assuming that the
least significant part is in the first operand of the BUILD_PAIR.
For big endian targets we should use the frame index from
the second operand, as the most significant part will be stored
at the lower address (using the highest frame index).
Reviewers: bogner, rnk, hfinkel, sdardis, aprantl
Reviewed By: aprantl
Subscribers: nemanjai, aprantl, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37740
llvm-svn: 313901
This patch updates register allocation to enable spilling gprs to
volatile vector registers rather than the stack. It can be enabled
for Power9 with option -ppc-enable-gpr-to-vsr-spills.
Differential Revision: https://reviews.llvm.org/D34815
llvm-svn: 313886
In case of using a "nested" relocation expressions like this
`%hi(%neg(%gp_rel()))`, N32 ABI requires generation of three consecutive
relocations. That differs from the N64 ABI case where all relocations
are packed into the single relocation record.
llvm-svn: 313879
More conversions to load-and-test can be made with this patch by adding a
forward search in optimizeCompareZero().
Review: Ulrich Weigand
https://reviews.llvm.org/D38076
llvm-svn: 313877
.. as well as the two subsequent changes r313826 and r313875.
This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).
llvm-svn: 313876
Summary:
There already was code that tried to remove the dbg.declare, but that code
was placed after we had called
I->replaceAllUsesWith(UndefValue::get(I->getType()));
on the alloca, so when we searched for the relevant dbg.declare, we
couldn't find it.
Now we do the search before we call RAUW so there is a chance to find it.
An existing testcase needed update due to this. Two dbg.declare with undef
were removed and then suddenly one of the two CHECKS failed.
Before this patch we got
call void @llvm.dbg.declare(metadata i24* undef, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
call void @llvm.dbg.declare(metadata %struct.prog_src_register* undef, metadata !14, metadata !DIExpression()), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
and with it we get
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
However, the CHECKs in the testcase checked things in a silly order, so
they only passed since they found things in the first dbg.declare. Now
we changed the order of the checks and the test passes.
Reviewers: rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37900
llvm-svn: 313875
The N32 ABI uses RELA relocation format, do not use 3-in-1 relocation's
encoding, and uses ELFCLASS32. This change passes the `IsN32` flag
to the `MCAsmBackend` to distinguish usage of N32 ABI.
We still do not handle some cases like providing the `-target-abi=o32`
command line option with the `mips64` target triple. That's why
elf_header.s contains some "FIXME" strings. This case will be fixed in
a separate patch.
Differential revision: https://reviews.llvm.org/D37960
llvm-svn: 313873
This patch prevents dsymutil from resolving a reference to a NULL DIE
when a bogus reference happens to be coincidentally referencing a NULL
DIE. Now this is detected as an invalid reference and a warning is
printed.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=33873
Differential revision: https://reviews.llvm.org/D38078
llvm-svn: 313872
This patch contains fix for reverted commit
rL312318 which was causing failure due to use
of unchecked dyn_cast to CIInit.
Patch by: Nikola Prica.
llvm-svn: 313870
Revert the patch causing the functional failures.
The patch owner is notified with test cases which fail.
Test case has been provided to Maxim offline.
llvm-svn: 313857
Passing "-dump" to llvm-cov will now print more detailed information
about function hash and counter mismatches. This should make it easier
to debug *.profdata files which contain incorrect records, and to debug
other scenarios where coverage goes missing due to mismatch issues.
llvm-svn: 313853
We can have a v_mac with an immediate src0.
We can still fold if it's an inline immediate,
otherwise it already uses the constant bus.
llvm-svn: 313852
Many editors and Python-related diagnostics tools such as
debuggers break or fail in mysterious ways when python files
don't end in .py. This is especially true on Windows, but
still exists on other platforms. I don't want to be too heavy
handed in changing everything across the board, but I do want
to at least *allow* lit configs to have .py extensions. This
patch makes the discovery process first look for a config file
with a .py extension, and if one is not found, then looks for
a config file using the old method. So for existing users, there
should be no functional change.
Differential Revision: https://reviews.llvm.org/D37838
llvm-svn: 313849
Summary:
This manifested itself in lld since it meant that weak
symbols were not appearing in archive symbol tables.
Subscribers: jfb, dschuff, jgravelle-google, aheejin
Differential Revision: https://reviews.llvm.org/D38111
llvm-svn: 313838
I noticed this inefficiency while investigating PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
This fix will likely push another bug (we don't maintain state of 'LateSimplifyCFG')
into hiding, but I'll try to clean that up with a follow-up patch anyway.
llvm-svn: 313829
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html
This is tracked as PR34136
llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.
The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.
Reviewers: aprantl, dblaikie, probinson
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37768
llvm-svn: 313825
This reverts commit SVN r313668. The original test case attempted to
write a pointer value into 16-bits, although the value may exceed the
range representable in 16-bits. Ensure that the symbol is located in
the address space such that its absolute address is representable in
16-bits. This should fix the assertion failure that was seen on the
Windows hosts.
llvm-svn: 313822
We already did (X & C2) > C1 --> (X & C2) != 0, if any bit set in (X & C2) will produce a result greater than C1. But there is an equivalent inverse condition with <= C1 (which will be canonicalized to < C1+1)
Differential Revision: https://reviews.llvm.org/D38065
llvm-svn: 313819
The Swift CC is identical to Win64 CC with the exception of swift error
being passed in r12 which is a CSR. However, since this calling
convention is only used in swift -> swift code, it does not impact
interoperability and can be treated entirely as Win64 CC. We would
previously incorrectly lower the frame setup as we did not treat the
frame as conforming to Win64 specifications.
llvm-svn: 313813
Also add some tests that should be able to use v_mad_mixhi_f16,
but do not yet. This is trickier because we don't really model
the partial update of the register done by 16-bit instructions.
llvm-svn: 313806
Add adds support for naming data segments. This is useful
useful linkers so that they can merge similar sections.
Differential Revision: https://reviews.llvm.org/D37886
llvm-svn: 313795
Add support for passing SwiftError through a register on the Windows x64
calling convention. This allows the use of swifterror attributes on
parameters which is used by the swift front end for the `Error`
parameter. This partially enables building the swift standard library
for Windows x86_64.
llvm-svn: 313791
This enables readobj to output Windows resource files (.res). This way,
we'll be able to test .res outputs without comparing them byte-by-byte
with "magic binary files" generated by MS toolchain.
Differential Revision: https://reviews.llvm.org/D38058
llvm-svn: 313790
After r313775, it's easier to maintain a parallel BitVector of spilled
locations indexed by location number.
I wasn't able to build a good reduced test case for this iteration of
the bug, but I added a more direct assertion that spilled values must
use frame index locations. If this bug reappears, it won't only fire on
the NEON vector code that we detected it on, but on medium-sized
integer-only programs as well.
llvm-svn: 313786
This broke the buildbots, e.g.
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask'
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Subscribers: mzolotukhin
>
> Reviewed By: ayal
>
> Differential Revision: https://reviews.llvm.org/D36130
>
> Review comments updated accordingly
>
> Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
>
> Added a TODO for sortLoadAccesses API
>
> Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
>
> Modified the TODO for sortLoadAccesses API
>
> Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
>
> Review comment update for using OpdNum to insert the mask in respective location
>
> Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
>
> Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
>
> Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313781
This patch implements the Darwin dwarfdump option --recurse-depth=<N>,
which limits the recursion depth when selectively printing DIEs at an
offset.
Differential Revision: https://reviews.llvm.org/D38064
llvm-svn: 313778
In these cases, two selects have constant selectable operands for
both the true and false components and have the same conditional
expression.
We then create two arithmetic operations of the same type and feed a
final select operation using the result of the true arithmetic for the true
operand and the result of the false arithmetic for the false operand and reuse
the original conditionl expression.
The arithmetic operations are naturally folded as a consequence, leaving
only the newly formed select to replace the old arithmetic operation.
Patch by: Michael Berg <michael_c_berg@apple.com>
Differential Revision: https://reviews.llvm.org/D37019
llvm-svn: 313774
I did not upload two binaries that I reference in tests.
This change adds support for sections involved in dynamic loading such
as SHT_DYNAMIC, SHT_DYNSYM, and allocated string tables.
The two added binaries used for tests can be downloaded here and here
Differential Revision: https://reviews.llvm.org/D36560
llvm-svn: 313772
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Subscribers: mzolotukhin
Reviewed By: ayal
Differential Revision: https://reviews.llvm.org/D36130
Review comments updated accordingly
Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
Added a TODO for sortLoadAccesses API
Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
Modified the TODO for sortLoadAccesses API
Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
Review comment update for using OpdNum to insert the mask in respective location
Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
I overzealously landed this before I was sure that another change
wouldn't break the build that this change depends on.
This change adds support for sections involved in dynamic loading such
as SHT_DYNAMIC, SHT_DYNSYM, and allocated string tables.
The two added binaries used for tests can be downloaded here and here
Differential Revision: https://reviews.llvm.org/D36560
llvm-svn: 313767
Summary:
The fix for dead stripping analysis in the case of SamplePGO indirect
calls to local functions (r313151) introduced the possibility of an
infinite loop.
Make sure we check for the value being already live after we update it
for SamplePGO indirect call handling.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D38086
llvm-svn: 313766
Remove unneeded attributes from test/DebugInfo/Generic/imported-name-inlined.ll because it was causing failures on pure MIPS builds.
Patch by Miloš Stojanović!
Differential Revision: https://reviews.llvm.org/D38079
llvm-svn: 313762
This version of the patch fixes an off-by-one error causing PR34596. We
do not need to use std::next(BlockIter) when calling updateDepths, as
BlockIter already points to the next element.
Original commit message:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619
llvm-svn: 313751
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
Commit after rebase for patch D36130
Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab
llvm-svn: 313736