Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6936
llvm-svn: 228263
Parts of llvm were not expecting it and we wouldn't print
the entity size of the section.
Given what comdats are used for, having SHF_MERGE sections would be
just a small improvement, so just disable it for now.
Fixes pr22463.
llvm-svn: 228196
v2i32, i32, trunc i32 to i16, and truc i32 to i8 stores are legal for
all address spaces. We had marked them as custom in order to lower
them for the private address space, but this is no longer necessary.
This enables lowering of misaligned stores of these types in the
DAGLegalizer.
llvm-svn: 228189
We were previously doing a post-order traversal and operating on the
list in reverse, however this would occasionaly cause backedges for
loops to be visited before some of the other blocks in the loop.
We know use a reverse post-order traversal, which avoids this issue.
The reverse post-order traversal is not completely ideal, so we need
to manually fixup the list to ensure that inner loop backedges are
visited before outer loop backedges.
llvm-svn: 228186
Track unresolved nodes under distinct `MDNode`s during `MapMetadata()`,
and resolve them at the end. Previously, these cycles wouldn't get
resolved.
llvm-svn: 228180
In case CSE reuses a previoulsy unused register the dead-def flag has to
be cleared on the def operand, as exposed by the arm64-cse.ll test.
This fixes PR22439 and the corresponding rdar://19694987
Differential Revision: http://reviews.llvm.org/D7395
llvm-svn: 228178
This is a bug that was caused due to storing the feature bitset in a 32-bit
variable when it is a 64-bit mask, discarding the top half of the feature set.
llvm-svn: 228151
Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp
load balancing pass isn't enabled for Cortex-A72 as it's not
profitable to have it enabled for this core.
Patch by Ranjeet Singh.
llvm-svn: 228140
This associates movss and movsd with the packed single and packed double
execution domains (resp.). While this is largely cosmetic, as we now
don't have weird ping-pong-ing between single and double precision, it
is also useful because it avoids the domain fixing algorithm from seeing
domain breaks that don't actually exist. It will also be much more
important if we have an execution domain default other than packed
single, as that would cause us to mix movss and movsd with integer
vector code on a regular basis, a very bad mixture.
llvm-svn: 228135
version of the script.
Changes include:
- Using the VEX prefix
- Skipping more detail when we have useful shuffle comments to match
- Matching more shuffle comments that have been added to the printer
(yay!)
- Matching the destination registers of some AVX instructions
- Stripping trailing whitespace that crept in
- Fixing indentation issues
Nothing interesting going on here. I'm just trying really hard to ensure
these changes don't show up in the diffs with actual changes to the
backend.
llvm-svn: 228132
This reverts patches 223862, 224198, 224203, and 224754, which were all
related to the vector load/store combining and were reverted/reaplied
a few times due to the same alignment problems we're seeing now.
Further tests, mainly self-hosting Clang, will be needed to reapply this
patch in the future.
llvm-svn: 228129
zero for v8i16 as well.
These exhibit the same domain badness, but also exhibit other weaknesses
in our blend lowering. More fixes to come.
llvm-svn: 228126
This is the simplest form of bit-math based blending which only fires
when we are blending with zero and is relatively profitable. I've only
enabled this path on very specific lowering strategies. I'm planning to
widen its applicability in subsequent patches, but so far you'll notice
that even though we get fewer shufps instructions, we *still* do the bit
math in the FP execution port. I'm looking into why this is still
happening.
llvm-svn: 228124
The ARM assembler allows register alias redefinitions as long as it
targets the same register. r222319 broke that. In the AArch64 case
it would just produce a new warning, but in the ARM case it would
error out on previously accepted assembler.
llvm-svn: 228109
update_llc_test_checks.py.
The exact format of the checks has changed over time. This includes
different indenting rules, new shuffle comments that have been added,
and more operand hiding behind regular expressions.
No functional change to the tests are expected here, but this will make
subsequent patches have a clean diff as they change shuffle lowering.
llvm-svn: 228097
update_llc_test_checks.py script uses, and refresh the checks in this
test.
No functionality changed here, just bringing this test up to work with
automated updates using the python script.
llvm-svn: 228096
This will make it easy to update as I change some parts of the X86
backend, makes it more clear what instruction differences are
introduced, and I find it makes it a bit easier to read as well.
llvm-svn: 228095
This preserves the handy functionality of force-enabling the MachineVerifier, without the need to embed usage of environment variables in LLVM client applications.
llvm-svn: 228079
Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699.
Differential Revision: http://reviews.llvm.org/D6649
llvm-svn: 228047
Patch by Kit Barton.
Add the vector population count instructions for byte, halfword, word,
and doubleword sizes. There are two major changes here:
PPCISelLowering.cpp: Make CTPOP legal for vector types.
PPCRegisterInfo.td: Added v2i64 to the VRRC register
definition. This is needed for the doubleword variations of the
integer ops that were added in P8.
Test Plan
Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s
Test the generation of the vpopcnt instructions for various vector
data types. When adding the v2i64 type to the Vector Register set, I
also needed to add the appropriate bit conversion patterns between
v2i64 and the existing vector types. Testing for these conversions
were also added in the test case by passing a different vector type as
a parameter into the test functions. There is also a run step that
will ensure the vpopcnt instructions are generated when the vsx
feature is disabled.
llvm-svn: 228046
with 'stress' to indicate that the specific output isn't interesting and
relax them to only check the last instruction (a ret).
I've updated the one test case that really uses this to name the one
'stress_test' which was actually producing output we can directly check.
With this, the script doesn't introduce noise when run over the v16 test
file.
llvm-svn: 228033
This patch adds general shuffle pattern matching for the MOVQ zero-extend instruction (copy lower 64bits, zero upper) for all 128-bit integer vectors, it is added as a fallback test in lowerVectorShuffleAsZeroOrAnyExtend.
llvm-svn: 228022
Summary:
Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,
LLVM unrolls
#pragma unroll
foo (int i = 0; i < 3; ++i) {
sum += foo((b + i) * s);
}
into
sum += foo(b * s);
sum += foo((b + 1) * s);
sum += foo((b + 2) * s);
However, no optimizations yet reduce the internal redundancy of the three
expressions:
b * s
(b + 1) * s
(b + 2) * s
With SLSR, LLVM can optimize these three expressions into:
t1 = b * s
t2 = t1 + s
t3 = t2 + s
This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.
Test Plan: test/StraightLineStrengthReduce/slsr.ll
Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick
Reviewed By: jholewinski, atrick
Subscribers: karthikthecool, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7310
llvm-svn: 228016
This patch detects consecutive vector loads using the existing
EltsFromConsecutiveLoads() logic. This fixes:
http://llvm.org/bugs/show_bug.cgi?id=22329
This patch effectively reverts the tablegen additions of D6492 /
http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack.
The test cases that were added with that patch are simply modified to load
from varying offsets of a base pointer. These loads did not match the existing
tablegen patterns.
A happy side effect of doing this optimization earlier is that we can now fold
the load into a math op where possible; this is shown in some of the updated
checks in the test file.
Differential Revision: http://reviews.llvm.org/D7303
llvm-svn: 228006
This can happen when a REV instruction is commuted.
The trick is not to define the _vi versions of instructions, which has these
consequences:
- code generation will always fail if a pseudo cannot be lowered
(very useful to catch bugs where an unsupported instruction somehow makes
it to the printer)
- ability to query if a pseudo can be lowered, which is done in commuteOpcode
to prevent REV from commuting to non-REV on VI
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 227990
This fixes a hang when using an empty geometry shader.
v2: - don't add s_nop when followed by s_waitcnt
- comestic changes
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 227986
r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit.
The commit log says that change did not affect anything, but that's not correct.
That change allowed SSE instructions to have unaligned mem operands folded into
math ops, and that's not allowed in the default specification for any SSE variant.
The bug is exposed when compiling for an AVX-capable CPU that had this feature
flag but without enabling AVX codegen. Another mistake in r224330 was not adding
the feature flag to all AVX CPUs; the AMD chips were excluded.
This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ).
This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem".
Changed the existing test case for the feature bit to reflect the new name and
renamed the test file itself to better reflect the feature.
Added runs to fold-vex.ll to check for the failing codegen.
Note that the feature bit is not set by default on any CPU because it may require a
configuration register setting to enable the enhanced unaligned behavior.
llvm-svn: 227983
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.
In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations. This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3. This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.
In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering. This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.
However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy. The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity. It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves. MachineCSE
does not have any machinery to common idempotent calls. This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace. In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.
I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes. However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.
All of which leads to the design presented here. This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering. Virtual registers
are used until prior to register allocation. At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them. The register allocator then coalesces these copies away. With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).
One additional problem must be dealt with: After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead. Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.
One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes. Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address. This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.
Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not. I've fixed the test case to reflect this.
There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.
I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.
llvm-svn: 227976
This test was checking for lack of a "movaps" (an aligned load)
rather than a "movups" (an unaligned load). It also included
a store which complicated the checking.
Add specific CPU runs to prevent subtarget feature flag overrides
from inhibiting this optimization.
llvm-svn: 227972
Improve EXTRACT_VECTOR_ELT DAG combine to catch conversion patterns
between x86mmx and i32 with more layers of indirection.
Before:
movq2dq %mm0, %xmm0
movd %xmm0, %eax
After:
movd %mm0, %eax
llvm-svn: 227969
LLVM ToT produces poor MMX code compared to 3.5. However, part of the previous
functionality can be achieved by using -x86-experimental-vector-widening-legalization.
Add tests to be sure we don't regress again.
llvm-svn: 227869
ObjectLinkingLayer.
There are a two of overloads for addObject, one of which transfers ownership of
the underlying buffer to OrcMCJITReplacement. This commit makes the ownership
transfering version pass ownership down to the ObjectLinkingLayer in order to
prevent the issue described in r227778.
I think this commit will fix the sanitizer bot failures that necessitated the
removal of the load-object-a.ll regression test in r227785, so I'm reinstating
that test.
llvm-svn: 227845
described by integer constants. This is a bit ugly, but if the source
language allows arbitrary type casting, the debug info must follow suit.
For example:
void foo() {
float a;
*(int *)&a = 0;
}
For the curious: SROA replaces the float alloca with an i32 alloca, which
is then optimized away and described via dbg.value(i32 0, ...).
llvm-svn: 227827
This is true for SI only. CI+ supports unaligned memory accesses,
but this requires driver support, so for now we disallow unaligned
accesses for all GCN targets.
llvm-svn: 227822
This avoids a partial false dependency on the previous content of
the upper lanes of the destination vector register.
Differential Revision: http://reviews.llvm.org/D7307
llvm-svn: 227820
Summary:
Previously it only avoided optimizing signed comparisons to 0.
Sometimes the DAGCombiner will optimize the unsigned comparisons
to 0 before it gets to the peephole pass, but sometimes it doesn't.
Fix for PR22373.
Test Plan: test/CodeGen/ARM/sub-cmp-peephole.ll
Reviewers: jfb, manmanren
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D7274
llvm-svn: 227809
The commit r225977 uncovered this bug. The problem was that the vectorizer tried to
read the second operand of an already deleted instruction.
The bug didn't show up before r225977 because the freed memory still contained a non-null pointer.
With r225977 deletion of instructions is delayed and the read operand pointer is always null.
llvm-svn: 227800
The VSX store instructions were also picking up an implicit "may read" from the
default pattern, which was an intrinsic (and we don't currently have a way of
specifying write-only intrinsics).
This was causing MI verification to fail for VSX spill restores.
llvm-svn: 227759
isel is actually a cracked instruction on the P7/P8, and must start a dispatch
group. The scheduling model should reflect this so that we don't bunch too many
of them together when possible.
Thanks to Bill Schmidt and Pat Haugen for helping to sort this out.
llvm-svn: 227758
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a
reserved call frame), and perform rudimentary call folding. It still doesn't
have a heuristic, so it is enabled only for optsize/minsize, with stack
alignment <= 8, where it ought to be a fairly clear win.
(Re-commit of r227728)
Differential Revision: http://reviews.llvm.org/D6789
llvm-svn: 227752
The TOC base pointer is passed in r2, and we normally reserve this register so
that we can depend on it being there. However, for leaf functions, and
specifically those leaf functions that don't do any TOC access of their own
(which is generally due to accessing the constant pool, using TLS, etc.),
we can treat r2 as an ordinary callee-saved register (it must be callee-saved
because, for local direct calls, the linker will not insert any save/restore
code).
The allocation order has been changed slightly for PPC64/ELF systems to put r2
at the end of the list (while leaving it near the beginning for Darwin systems
to prevent unnecessary output changes). While r2 is allocatable, using it still
requires spill/restore traffic, and thus comes at the end of the list.
llvm-svn: 227745
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a
reserved call frame), and perform rudimentary call folding. It still doesn't
have a heuristic, so it is enabled only for optsize/minsize, with stack
alignment <= 8, where it ought to be a fairly clear win.
Differential Revision: http://reviews.llvm.org/D6789
llvm-svn: 227728
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.
llvm-svn: 227726
I've added RUN lines both to the basic test for EarlyCSE and the
target-specific test, as this serves as a nice test that the TTI layer
in the new pass manager is in fact working well.
llvm-svn: 227725
over declarations.
This is both quite unproductive and causes things to crash, for example
domtree would just assert.
I've added a declaration and a domtree run to the basic high-level tests
for the new pass manager.
llvm-svn: 227724
produce it.
This adds a function to the TargetMachine that produces this analysis
via a callback for each function. This in turn faves the way to produce
a *different* TTI per-function with the correct subtarget cached.
I've also done the necessary wiring in the opt tool to thread the target
machine down and make it available to the pass registry so that we can
construct this analysis from a target machine when available.
llvm-svn: 227721
Summary:
CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.
This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.
Test Plan: test/CodeGen/NVPTX/nounroll.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7041
llvm-svn: 227703
aggregate or scalar, the debug info needs to refer to the absolute offset
(relative to the entire variable) instead of storing the offset inside
the smaller aggregate.
llvm-svn: 227702
This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd).
Also adds shuffle mask decodes for integer loads (movd/movq).
Differential Revision: http://reviews.llvm.org/D7228
llvm-svn: 227688
Add a trivial binary (int main() { return 0; }) built for Windows on ARM to
ensure that we can correctly identify ARM_MOV32(T) base relocations. Addresses
post-commit review comments.
llvm-svn: 227673
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
Now that -mstack-probe-size is piped through to the backend via the function
attribute as on Windows x86, honour the value to permit handling of non-default
values for stack probes. This is needed /Gs with the clang-cl driver or
-mstack-probe-size with the clang driver when targeting Windows on ARM.
llvm-svn: 227667
segname,sectname to specify a Mach-O section to print. The printing is based on
the section type or section attributes.
The printing of the module initialization and termination section types is printed
with this change. Printing of other section types will be added next.
llvm-svn: 227649
Same sort of bug as on ARM where the cmp+branch are lowered to br_cc
(choosing the branch's debugloc for the br_cc's debugloc) then expanded
out to a cmp and a br, but both using the debug loc of the br_cc, thus
losing fidelity.
llvm-svn: 227645
Patch by: Igor Laevsky
"Statepoint verifier tests were using wrong names for the statepoint and gc.relocate intrinsics. This change renames them to use correct names and fixes all uncovered issues."
Differential Revision: http://reviews.llvm.org/D7266
llvm-svn: 227636
Some of those didn't even have run lines: they were removed
inadvertently during the Great Merge of 2014.
They used to check for DUPs, but now we go through W-regs?
Filed PR22418 for that potential regression.
For now, just make the tests explicit, so we now where we stand.
llvm-svn: 227635
Also revert r227489 since it didn't actually fix the thing I thought I
was fixing (since the test case was targeting the wrong architecture
initially). The change might be correct & demonstrated by other test
cases, but it's not a priority for me to find those test cases right
now.
Filed PR22417 for the failure.
llvm-svn: 227632
MSDN's x64 software conventions page says that this is one of the fixed
list of legal epilogues:
https://msdn.microsoft.com/en-us/library/tawsa7cb.aspx
Presumably this is how the unwinder distinguishes epilogue jumps from
in-function control flow.
Also normalize the way we place "## TAILCALL" comments on such jumps.
llvm-svn: 227611
If the original FPU specification involved a restricted VFP unit (d16), ensure
that we reset the functionality when we encounter a new FPU type. In
particular, if the user specified vfpv3-d16, but switched to a VFPv3 (which has
32 double precision registers), we would fail to reset the D16 feature, and
treat it as being equivalent to vfpv3-d16.
llvm-svn: 227603
The FPU directive permits the user to switch the target FPU, enabling
instructions that would be otherwise unavailable. However, when configuring the
new subtarget features, we would not enable the implied functions for newer
FPUs. This would result in invalid rejection of valid input. Ensure that we
inherit the implied FPU functionality when enabling newer versions of the FPU.
Fortunately, these are mostly hierarchical, unlike the CPUs.
Addresses PR22395.
llvm-svn: 227584
Summary:
This is needed by the .cprestore assembler directive.
This directive needs to be able to insert an LW instruction after every JALR replacement of a JAL pseudo-instruction
(and never after a JALR which has NOT been a result of a pseudo-instruction replacement).
The problem with using InstAlias for these is that after it replaces the pseudo-instruction, we can't find out if the resulting JALR instruction
was generated by an InstAlias or not, so we don't know whether or not to insert our LW instruction.
By replacing it manually, we know when the pseudo-instruction replacement happens and we can insert the LW instruction correctly.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D5601
llvm-svn: 227568
Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support
arbitrary constant steps.
Initial patch by Alexey Volkov. Some bug fixes are added in the following version.
Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193
llvm-svn: 227557
accumulateAndSortLibcalls in LTOCodeGenerator.cpp collects names of runtime
library functions which are used to identify user-defined functions that should
be protected. Previously, this function would only scan the TargetLowering
object belonging to the "main" subtarget for the library function names. This
commit changes it to scan all per-function subtargets.
Differential Revision: http://reviews.llvm.org/D7275
llvm-svn: 227533
In the large code model, we now put __chkstk in %r11 before calling it.
Refactor the code so that we only do this once. Simplify things by using
__chkstk_ms instead of __chkstk on cygming. We already use that symbol
in the prolog emission, and it simplifies our logic.
Second half of PR18582.
llvm-svn: 227519
win64: Call __chkstk through a register with the large code model
Fixes half of PR18582. True dynamic allocas will still have a
CALL64pcrel32 which will fail.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7267
llvm-svn: 227503
In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities
to allow some simple value range optimizations. But that introduced a bug
when comparing to -0.0 or 0.0: these compare equal even though they are not
bitwise identical.
This patch disallows propagating zero constants in equality comparisons.
Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376
Differential Revision: http://reviews.llvm.org/D7257
llvm-svn: 227491
Add tests for the various combines. This should
always be at least cycle neutral on all subtargets for f64,
and faster on some. For f32 we should prefer selecting
v_mad_f32 over v_fma_f32.
llvm-svn: 227484
For large stack offsets the compiler generates multiple immediate mode
sub/add instructions in the prologue/epilogue. This patch makes the
compiler place the final amount to be added/subtracted into a register,
which is then added/substracted with a single operation.
Differential Revision: http://reviews.llvm.org/D7226
llvm-svn: 227458
Patch by Nemanja Ivanovic.
As was uncovered by the failing test case (when run on non-PPC
platforms), the feature set when compiling with -march=ppc64le was not
being picked up. This change ensures that if the -mcpu option is not
specified, the correct feature set is picked up regardless of whether
we are on PPC or not.
llvm-svn: 227455
ELF has support for sections that can be split into fixed size or
null terminated entities.
Since these sections can be split by the linker, it is not necessary
to split them in codegen.
This reduces the combined .o size in a llvm+clang build from
202,394,570 to 173,819,098 bytes.
The time for linking clang with gold (on a VM, on a laptop) goes
from 2.250089985 to 1.383001792 seconds.
The flip side is the size of rodata in clang goes from 10,926,785
to 10,929,345 bytes.
The increase seems to be because of http://sourceware.org/bugzilla/show_bug.cgi?id=17902.
llvm-svn: 227431
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.
Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors. Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception. Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7216
llvm-svn: 227405
Patch by: Igor Laevsky <igor@azulsystems.com>
"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."
Differential Revision: http://reviews.llvm.org/D7157
llvm-svn: 227390
It's an empty shell for now. It's main method just opens the debug
map objects and parses their Dwarf info. Test that we at least do
that correctly.
llvm-svn: 227337
Summary:
MetadataAsValue uses a canonical format that strips the MDNode if it
contains only a single constant value. This triggers an assertion when
trying to cast the value to a MDNode.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7165
llvm-svn: 227319
The `Addend` is an optional field of the `Relocation` YAML record. But
we do not provide its default value while reading it from a YAML file
and so it might keep uninitialized.
I am going to fix the code by a separate commit. We might either make
this field mandatory (at least for .rela sections) or specify 0 as
a default value explicitly.
llvm-svn: 227318
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.
Differential Revision: http://reviews.llvm.org/D7196
llvm-svn: 227308
This includes two things:
1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness).
2) Allow LEA64_32 in MatchingStackOffset.
llvm-svn: 227307
By Asaf Badouh and Elena Demikhovsky
Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.
llvm-svn: 227303
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
- the predicate is olt or uge
- the first operand is provably not less than zero
- the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..
http://reviews.llvm.org/D6972
llvm-svn: 227298
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7214
llvm-svn: 227284
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).
llvm-svn: 227272
COMDATs must be identically named to the symbol. When support for COMDATs was
introduced, the symbol rewriter was not updated, resulting in rewriting failing
for symbols which were placed into COMDATs. This corrects the behaviour and
adds test cases for this.
llvm-svn: 227261
This was introduced in a faulty refactoring (r225640, mea culpa):
the tests weren't testing the return values, so, for both
__strcpy_chk and __stpcpy_chk, we would return the end of the
buffer (matching stpcpy) instead of the beginning (for strcpy).
The root cause was the prefix "__" being ignored when comparing,
which made us always pick LibFunc::stpcpy_chk.
Pass the LibFunc::Func directly to avoid this kind of error.
Also, make the testcases as explicit as possible to prevent this.
The now-useful testcases expose another, entangled, stpcpy problem,
with the further simplification. This was introduced in a
refactoring (r225640) to match the original behavior.
However, this leads to problems when successive simplifications
generate several similar instructions, none of which are removed
by the custom replaceAllUsesWith.
For instance, InstCombine (the main user) doesn't erase the
instruction in its custom RAUW. When trying to simplify say
__stpcpy_chk:
- first, an stpcpy is created (fortified simplifier),
- second, a memcpy is created (normal simplifier), but the
stpcpy call isn't removed.
- third, InstCombine later revisits the instructions,
and simplifies the first stpcpy to a memcpy. We now have
two memcpys.
llvm-svn: 227250
Splitting a loop to make range checks redundant is profitable only if
the range check "never" fails. Make this fact a part of recognizing a
range check -- a branch is a range check only if it is expected to
pass (via branch_weights metadata).
Differential Revision: http://reviews.llvm.org/D7192
llvm-svn: 227249
This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).
The 'f3' test case in that report presents a situation where we have two 128-bit
stores extracted from a 256-bit source vector.
Instead of producing this:
vmovaps %xmm0, (%rdi)
vextractf128 $1, %ymm0, 16(%rdi)
This patch merges the 128-bit stores into a single 256-bit store:
vmovups %ymm0, (%rdi)
Differential Revision: http://reviews.llvm.org/D7208
llvm-svn: 227242
If a memory access is unaligned, emit __tsan_unaligned_read/write
callbacks instead of __tsan_read/write.
Required to change semantics of __tsan_unaligned_read/write to not do the user memory.
But since they were unused (other than through __sanitizer_unaligned_load/store) this is fine.
Fixes long standing issue 17:
https://code.google.com/p/thread-sanitizer/issues/detail?id=17
llvm-svn: 227231
No other test I know shows how struct names are mangled in overloaded
intrinsic functions.
Differential Revision: http://reviews.llvm.org/D7037
llvm-svn: 227229
This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by
a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared.
Added test InstCombine/select-cmp-cttz-ctlz.ll.
llvm-svn: 227197
LoopRotate wanted to avoid live range interference by looking at the
uses of a Value in the loop latch and seeing if any lied outside of the
loop. We would wrongly perform this operation on Constants.
This fixes PR22337.
llvm-svn: 227171
These tests check that the combination of 227110 (cross block query inst) and 227112 (volatile load semantics) work together properly to allow PRE in cases where a loop contains a volatile access.
llvm-svn: 227146
For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can.
Differential Revision: http://reviews.llvm.org/D7178
llvm-svn: 227145
Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask.
Differential Revision: http://reviews.llvm.org/D7180
llvm-svn: 227141
- Rename mmx-builtins to mmx-intrinsics to match other intrinsic test naming.
- Remove tests that duplicate functionality from mmx-intrinsics.ll.
- Move arith related tests to mmx-arith.ll.
- MMX related shuffle goes to vector-shuffle-mmx.ll.
llvm-svn: 227130
An unreachable default destination can be exploited by other optimizations and
allows for more efficient lowering. Both the SDag switch lowering and
LowerSwitch can exploit unreachable defaults.
Also make TurnSwitchRangeICmp handle switches with unreachable default.
This is kind of separate change, but it cannot be tested without the change
above, and I don't want to land the change above without this since that would
regress other tests.
Differential Revision: http://reviews.llvm.org/D6471
llvm-svn: 227125
Instead of creating a pattern like "(p && a) || ((!p) && b)",
just expand the i8 operands to i32 and perform the selp on them.
Fixes PR22246
llvm-svn: 227123
According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile.
The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes. I will be submitting a PRE w/volatiles test case seperately in the near future.
Differential Revision: http://reviews.llvm.org/D6901
llvm-svn: 227112
This patch fixes the following miscompile:
define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
%0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind
%a0 = extractelement <2 x double> %0, i32 0
%conv = fptrunc double %a0 to float
%a1 = extractelement <2 x double> %0, i32 1
%conv3 = fptrunc double %a1 to float
tail call void @callee2(float %conv, float %conv3) nounwind
ret void
}
Current codegen:
sqrtsd %xmm0, %xmm1 ## high element of %xmm1 is undef here
xorps %xmm0, %xmm0
cvtsd2ss %xmm1, %xmm0
shufpd $1, %xmm1, %xmm1
cvtsd2ss %xmm1, %xmm1 ## operating on undef value
jmp _callee
This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 )
which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 );
this should be the final patch needed to resolve that bug.
Differential Revision: http://reviews.llvm.org/D6885
llvm-svn: 227111
This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block.
Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that.
Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself.
Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well.
Differential Revision: http://reviews.llvm.org/D6895
llvm-svn: 227110
than on MipsSubtargetInfo.
This required a bit of massaging in the MC level to handle this since
MC is a) largely a collection of disparate classes with no hierarchy,
and b) there's no overarching equivalent to the TargetMachine, instead
only the subtarget via MCSubtargetInfo (which is the base class of
TargetSubtargetInfo).
We're now storing the ABI in both the TargetMachine level and in the
MC level because the AsmParser and the TargetStreamer both need to
know what ABI we have to parse assembly and emit objects. The target
streamer has a pointer to the one in the asm parser and is updated
when the asm parser is created. This is fragile as the FIXME comment
notes, but shouldn't be a problem in practice since we always
create an asm parser before attempting to emit object code via the
assembler. The TargetMachine now contains the ABI so that the DataLayout
can be constructed dependent upon ABI.
All testcases have been updated to use the -target-abi command line
flag so that we can set the ABI without using a subtarget feature.
Should be no change visible externally here.
llvm-svn: 227102
Summary:
This patch adds support for some operations that were missing from
128-bit integer types (add/sub/mul/sdiv/udiv... etc.). With these
changes we can support the __int128_t and __uint128_t data types
from C/C++.
Depends on D7125
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7143
llvm-svn: 227089
This reverts commit r227003. Support for addition/subtraction and
various other operations for the i128 data type will be added in a
future commit based on the review D7143.
llvm-svn: 227082
-no-exec-stack. This was due to it not deriving from the correct
asm info base class and missing the override for the exec
stack section query. Added another line to the noexec test
line to make sure this doesn't regress.
llvm-svn: 227074
physical register that is described in a DBG_VALUE.
In the testcase the DBG_VALUE describing "p5" becomes unavailable
because the register its address is in is clobbered and we (currently)
aren't smart enough to realize that the value is rematerialized immediately
after the DBG_VALUE and/or is actually a stack slot.
llvm-svn: 227056
It appears we have different behavior with and without -mcpu=pwr8 even
with ppc64le defaulting to POWER8. The failure appears as follows:
/home/bb/cmake-llvm-x86_64-linux/llvm-project/llvm/test/CodeGen/PowerPC/ppc64le-aggregates.ll:268:14: error: expected string not found in input
; CHECK-DAG: lfs 1, 0([[REG]])
^
<stdin>:497:11: note: scanning from here
ld 3, .LC1@toc@l(3)
^
<stdin>:497:11: note: with variable "REG" equal to "3"
ld 3, .LC1@toc@l(3)
^
<stdin>:514:2: note: possible intended match here
lfs 1, 0(4)
^
Reverting this particular test case change. Nemanja, please have a look
at the reason for the failure.
llvm-svn: 227055
Test by Nemanja Ivanovic.
Since ppc64le implies POWER8 as a minimum, it makes sense that the
same features are included. Since the pwr8 processor model will likely
be getting new features until the implementation is complete, I
created a new list to add these updates to. This will include them in
both pwr8 and ppc64le.
Furthermore, it seems that it would make sense to compose the feature
lists for other processor models (pwr3 and up). Per discussion in the
review, I will make this change in a subsequent patch.
In order to test the changes, I've added an additional run step to
test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu
option. Since the feature lists are the same, the behaviour should be
unchanged.
llvm-svn: 227053
MIPS64 ELF file has a very specific relocation record format. Each
record might specify up to three relocation operations. So the `r_info`
field in fact consists of three relocation type sub-fields and optional
code of "special" symbols.
http://techpubs.sgi.com/library/manuals/4000/007-4658-001/pdf/007-4658-001.pdf
page 40
The patch implements support of the MIPS64 relocation record format in
yaml2obj/obj2yaml tools by introducing new optional Relocation fields:
Type2, Type3, and SpecSym. These fields are recognized only if the
object/YAML file relates to the MIPS64 target.
Differential Revision: http://reviews.llvm.org/D7136
llvm-svn: 227044
- Added KSHIFTB/D/Q for skx
- Added KORTESTB/D/Q for skx
- Fixed store operation for v8i1 type for KNL
- Store size of v8i1, v4i1 and v2i1 are changed to 8 bits
llvm-svn: 227043
These tests used to test the legacy JIT but since that has been removed they're
just redundantly testing MCJIT. Remove them and just leave their counterparts in
test/ExecutionEngine/MCJIT.
llvm-svn: 227010
Summary:
V8->V9:
- cleanup tests
V7->V8:
- addressed feedback from David:
- switched to range-based 'for' loops
- fixed formatting of tests
V6->V7:
- rebased and adjusted AsmPrinter args
- CamelCased .td, fixed formatting, cleaned up names, removed unused patterns
- diffstat: 3 files changed, 203 insertions(+), 227 deletions(-)
V5->V6:
- addressed feedback from Chandler:
- reinstated full verbose standard banner in all files
- fixed variables that were not in CamelCase
- fixed names of #ifdef in header files
- removed redundant braces in if/else chains with single statements
- fixed comments
- removed trailing empty line
- dropped debug annotations from tests
- diffstat of these changes:
46 files changed, 456 insertions(+), 469 deletions(-)
V4->V5:
- fix setLoadExtAction() interface
- clang-formated all where it made sense
V3->V4:
- added CODE_OWNERS entry for BPF backend
V2->V3:
- fix metadata in tests
V1->V2:
- addressed feedback from Tom and Matt
- removed top level change to configure (now everything via 'experimental-backend')
- reworked error reporting via DiagnosticInfo (similar to R600)
- added few more tests
- added cmake build
- added Triple::bpf
- tested on linux and darwin
V1 cover letter:
---------------------
recently linux gained "universal in-kernel virtual machine" which is called
eBPF or extended BPF. The name comes from "Berkeley Packet Filter", since
new instruction set is based on it.
This patch adds a new backend that emits extended BPF instruction set.
The concept and development are covered by the following articles:
http://lwn.net/Articles/599755/http://lwn.net/Articles/575531/http://lwn.net/Articles/603983/http://lwn.net/Articles/606089/http://lwn.net/Articles/612878/
One of use cases: dtrace/systemtap alternative.
bpf syscall manpage:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4fc1a460f3017e958e6a8ea560ea0afd91bf6fe
instruction set description and differences vs classic BPF:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt
Short summary of instruction set:
- 64-bit registers
R0 - return value from in-kernel function, and exit value for BPF program
R1 - R5 - arguments from BPF program to in-kernel function
R6 - R9 - callee saved registers that in-kernel function will preserve
R10 - read-only frame pointer to access stack
- two-operand instructions like +, -, *, mov, load/store
- implicit prologue/epilogue (invisible stack pointer)
- no floating point, no simd
Short history of extended BPF in kernel:
interpreter in 3.15, x64 JIT in 3.16, arm64 JIT, verifier, bpf syscall in 3.18, more to come in the future.
It's a very small and simple backend.
There is no support for global variables, arbitrary function calls, floating point, varargs,
exceptions, indirect jumps, arbitrary pointer arithmetic, alloca, etc.
From C front-end point of view it's very restricted. It's done on purpose, since kernel
rejects all programs that it cannot prove safe. It rejects programs with loops
and with memory accesses via arbitrary pointers. When kernel accepts the program it is
guaranteed that program will terminate and will not crash the kernel.
This patch implements all 'must have' bits. There are several things on TODO list,
so this is not the end of development.
Most of the code is a boiler plate code, copy-pasted from other backends.
Only odd things are lack or < and <= instructions, specialized load_byte intrinsics
and 'compare and goto' as single instruction.
Current instruction set is fixed, but more instructions can be added in the future.
Signed-off-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subscribers: majnemer, chandlerc, echristo, joerg, pete, rengolin, kristof.beyls, arsenm, t.p.northover, tstellarAMD, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6494
llvm-svn: 227008
Summary:
At the moment, address calculation is taking the debug line info from the
address node (e.g. TargetGlobalAddress). When a function is called multiple
times, this results in output of the form:
.loc $first_call_location
.. address calculation ..
.. function call ..
.. address calculation ..
.loc $second_call_location
.. function call ..
.loc $first_call_location
.. address calculation ..
.loc $third_call_location
.. function call ..
This patch makes address calculations for function calls take the debug line
info for the call node and results in output of the form:
.loc $first_call_location
.. address calculation ..
.. function call ..
.loc $second_call_location
.. address calculation ..
.. function call ..
.loc $third_call_location
.. address calculation ..
.. function call ..
All other address calculations continue to use the address node.
Test Plan: Fixes test/DebugInfo/multiline.ll on a mips host.
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D7050
llvm-svn: 227005
Summary:
In addition to the included tests, this fixes
test/CodeGen/Generic/i128-addsub.ll on a mips64 host.
Reviewers: atanasyan, sagar, vmedic
Reviewed By: vmedic
Subscribers: sdkie, llvm-commits
Differential Revision: http://reviews.llvm.org/D6610
llvm-svn: 227003
This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.
Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.
llvm-svn: 227002
This just lifts the logic into a static helper function, sinks the
legacy pass to be a trivial wrapper of that helper fuction, and adds
a trivial wrapper for the new PM as well. Not much to see here.
I switched a test case to run in both modes, but we have to strip the
dead prototypes separately as that pass isn't in the new pass manager
(yet).
llvm-svn: 226999
This is exciting as this is a much more involved port. This is
a complex, existing transformation pass. All of the core logic is shared
between both old and new pass managers. Only the access to the analyses
is separate because the actual techniques are separate. This also uses
a bunch of different and interesting analyses and is the first time
where we need to use an analysis across an IR layer.
This also paves the way to expose instcombine utility functions. I've
got a static function that implements the core pass logic over
a function which might be mildly interesting, but more interesting is
likely exposing a routine which just uses instructions *already in* the
worklist and combines until empty.
I've switched one of my favorite instcombine tests to run with both as
well to make sure this keeps working.
llvm-svn: 226987
Eventually we can make some of these pass the error along to the caller.
Reports a fatal error if:
We find an invalid abbrev record
We try to get an invalid abbrev number
We can't fill the current word due to an EOF
Fixed an invalid bitcode test to check for output with FileCheck
Bugs found with afl-fuzz
llvm-svn: 226986
This patch adds the missing LD[U]RSW variants to the load store optimizer, so
that we generate LDPSW when possible.
<rdar://problem/19583480>
llvm-svn: 226978
These tests are asserting and crashing for me, and 'not' sees that as a
non-zero exit code instead of a signal code for obscure Windows reasons.
This causes the test to pass, giving me an unclean 'ninja check'.
The test is already XFAILd, so just run the test without 'not' and let
lit handle the failure.
llvm-svn: 226958
Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of
using stack store/load pair to do the job, use scalar_to_vector directly, which
in the MMX case can use movq2dq. This was the current behavior prior to
improvements for vector legalization of extloads in r213897.
This commit fixes the regression and as a side-effect also remove some
unnecessary shuffles.
In the new attached testcase, we go from:
pshufw $-18, (%rdi), %mm0
movq %mm0, -8(%rsp)
movq -8(%rsp), %xmm0
pshufd $-44, %xmm0, %xmm0
movd %xmm0, %eax
...
To:
pshufw $-18, (%rdi), %mm0
movq2dq %mm0, %xmm0
movd %xmm0, %eax
...
Differential Revision: http://reviews.llvm.org/D7126
rdar://problem/19413324
llvm-svn: 226953
We used to do this promotion during DAG legalization, but this
caused an infinite loop in ExpandUnalignedLoad() because it assumed
that i64 loads were legal if i64 was a legal type.
It also seems better to report i64 loads as legal, since they actually
are and we were just promoting them to simplify our tablegen files.
llvm-svn: 226945