The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
llvm-svn: 233024
This enables very common cases to switch to the
smaller encoding.
All of the standard LLVM canonicalizations of comparisons
are the opposite of what we want. Compares with constants
are moved to the RHS, but the first operand can be an inline
immediate, literal constant, or SGPR using the 32-bit VOPC
encoding.
There are additional bad canonicalizations that should
also be fixed, such as canonicalizing ge x, k to gt x, (k + 1)
if this makes k no longer an inline immediate value.
llvm-svn: 232988
This change is incorrect since it converts double rounding into single rounding,
which can produce different results. Instead this optimization will be done by
modifying Clang's codegen to not produce double rounding in the first place.
This reverts commit r232954.
llvm-svn: 232962
Specifically when the conversion is done in two steps, f16 -> f32 -> f64.
For example:
%1 = tail call float @llvm.convert.from.fp16.f32(i16 %0)
%conv = fpext float %1 to double
to:
vcvtb.f64.f16
llvm-svn: 232954
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D7791
llvm-svn: 232943
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).
llvm-svn: 232935
A build directory with a name like `build-Werror` would hit a false
positive on these `CHECK-NOT`s before, since the actual error line looks
like:
.../build-Werror/bin/llvm-as <stdin>:1:2: error: ...
Switch to using:
CHECK-NOT: error:
(note the trailing semi-colon) to avoid matching almost any file path.
llvm-svn: 232917
strchr("123!", C) != nullptr is a common pattern to check if C is one
of 1, 2, 3 or !. If the largest element of the string is smaller than
the target's register size we can easily create a bitfield and just
do a simple test for set membership.
int foo(char C) { return strchr("123!", C) != nullptr; } now becomes
cmpl $64, %edi ## range check
sbbb %al, %al
movabsq $0xE000200000001, %rcx
btq %rdi, %rcx ## bit test
sbbb %cl, %cl
andb %al, %cl ## and the two conditions
andb $1, %cl
movzbl %cl, %eax ## returning an int
ret
(imho the backend should expand this into a series of branches, but
that's a different story)
The code is currently limited to bit fields that fit in a register, so
usually 64 or 32 bits. Sadly, this misses anything using alpha chars
or {}. This could be fixed by just emitting a i128 bit field, but that
can generate really ugly code so we have to find a better way. To some
degree this is also recreating switch lowering logic, but we can't
simply emit a switch instruction and thus change the CFG within
instcombine.
llvm-svn: 232902
r216771 introduced a change to MemoryDependenceAnalysis that allowed it
to reason about acquire/release operations. However, this change does
not ensure that the acquire/release operations pair. Unfortunately,
this leads to miscompiles as we won't see an acquire load as properly
memory effecting. This largely reverts r216771.
This fixes PR22708.
llvm-svn: 232889
As preparation for removing the getSubtargetImpl() call from
TargetMachine go ahead and flip the switch on caching the function
dependent subtarget and remove the bare getSubtargetImpl call
from the X86 port. As part of this add a few tests that show we
can generate code and assemble on X86 based on features/cpu on
the Function.
llvm-svn: 232879
If we couldn't analyze its terminator (i.e., it's an indirectbr, or some
other weirdness), we can't safely re-if-convert a predicated block,
because we can't tell whether the predicated terminator can
fallthrough (it does).
Currently, we would completely ignore the fallthrough successor. In
the added testcase, this means we used to generate:
...
@ %entry:
cmp r5, #21
ittt ne
@ %cc1f:
cmpne r7, #42
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %cc1t:
...
Whereas the successor of %cc1f was originally %bb1.
With the fix, we get the correct:
...
@ %entry:
cmp r5, #21
itt eq
@ %cc1t:
streq.w r5, [r11]
moveq pc, r0
@ %cc1f:
cmp r7, #42
itt ne
@ %cc2t:
strne.w r5, [r8]
movne pc, r10
@ %bb1:
...
rdar://20192768
Differential Revision: http://reviews.llvm.org/D8509
llvm-svn: 232872
vperm2* intrinsics are just shuffles.
In a few special cases, they're not even shuffles.
Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:
1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
really angry (and so I have regrets about some patches from last week).
2. Doing mask conversion logic in header files is hard to write and
subsequently read.
There are a couple of TODOs in this patch to complete this optimization.
Differential Revision: http://reviews.llvm.org/D8486
llvm-svn: 232852
With this patch, for this one exact case, we'll generate:
blendps %xmm0, %xmm1, $1
instead of:
insertps %xmm0, %xmm1, $0
If there's a memory operand available for load folding and we're
optimizing for size, we'll still generate the insertps.
The detailed performance data motivation for this may be found in D7866;
in summary, blendps has 2-3x throughput vs. insertps on widely used chips.
Differential Revision: http://reviews.llvm.org/D8332
llvm-svn: 232850
The code this patch removes was there to make sure the text sections went
before the dwarf sections. That is necessary because MachO uses offsets
relative to the start of the file, so adding a section can change relaxations.
The dwarf sections were being printed at the start just to produce symbols
pointing at the start of those sections.
The underlying issue was fixed in r231898. The dwarf sections are now printed
when they are about to be used, which is after we printed the text sections.
To make sure we don't regress, the patch makes the MachO streamer assert
if CodeGen puts anything unexpected after the DWARF sections.
llvm-svn: 232842
The main differences are:
* Split in 32 and 64 bit functions.
* First switch on the Modifier so that we have only one non fully covered
switch.
* Map the fixup kind first to a x86_64 (or i386) specific enum, to make
it easy to handle cases like X86::reloc_riprel_4byte_movq_load.
* Switch on IsPCRel last, which reduces code duplication.
Fixes pr22308.
llvm-svn: 232837
A WIP patch makes `DIDescriptor` accessors more strict, which in turn
causes the `DebugInfoFinder` to crash on wrongly typed `!dbg`
attachments. Catch that error up front in
`Verifier::visitInstruction()`.
Also remove a test that we "handle" invalid `!dbg` attachments, added
back in r99938. We don't want to handle those anymore.
Note: I'm *not* recursing and verifying the debug info graph reachable
from this node; that work is already done by `verifyDebugInfo()`.
llvm-svn: 232834
This test is supposed to be testing whether metadata attachments to
instructions work, but it was using invalid debug info to do so. (This
was causing assertion failures in the `DebugInfoFinder` with a WIP patch
to be more strict about `DIDescriptor` accessors.)
Rather than fix the debug info -- which is better tested elsewhere --
just test the IR feature directly.
llvm-svn: 232828
When estimating SROA savings, we want to see if an address is derived
off an alloca in the caller. For store instructions, operand 1 is the
address operand, but the current code uses operand 0. Use
getPointerOperand for loads and stores to fix this.
Patch by Easwaran Raman.
http://reviews.llvm.org/D8425
llvm-svn: 232827
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
llvm-svn: 232825
With the option -outline-optional-branches, LLVM will place optional
branches out of line (more details on r231230).
With this patch, this is not done for short optional branches. A short
optional branch is a branch containing a single block with an
instruction count below a certain threshold (defaulting to 3). Still
everything is guarded under -outline-optional-branches).
Outlining a short branch can't significantly improve code locality. It
can however decrease performance because of the additional jmp and in
cases where the optional branch is hot. This fixes a compile time
regression I have observed in a benchmark.
Review: http://reviews.llvm.org/D8108
llvm-svn: 232802
This works in a similar way to the gold plugin tests. We search for a compatible
linker on $PATH and use it to run tests against our just-built libLTO. To start
with, test the just added opt level functionality.
Differential Revision: http://reviews.llvm.org/D8472
llvm-svn: 232785
This is very related to the bug fixed in r174431. The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores. When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.
I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.
llvm-svn: 232780
This switches the sense of the i32 values and updates the test cases.
We can also use CHECK-SAME to clean up some tests, and reduce the visual
noise from bitcasts.
llvm-svn: 232774
Another case of x86-specific shuffle strength reduction:
avoid generating insert*128 instructions with index 0 because
they are slower than their non-lane-changing blend equivalents.
Shuffle lowering already catches most of these cases, but
the zero vector case and some other paths such as in the
modified test in vector-shuffle-256-v32.ll were getting
through.
Differential Revision: http://reviews.llvm.org/D8366
llvm-svn: 232773
Each use of the byte array uses a different alias. This makes the
backend less likely to reuse previously computed byte array addresses,
improving the security of the CFI mechanism based on this pass.
Differential Revision: http://reviews.llvm.org/D8455
llvm-svn: 232770
This change also introduces a link-time optimization level of 1. This
optimization level runs only the globaldce pass as well as cleanup passes for
passes that run at -O0, specifically simplifycfg which cleans up lowerbitsets.
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150316/266951.html
llvm-svn: 232769
Summary:
CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect
and that triggers an assertion in nvvm_reflect optimization pass. This
change allows nvvm_reflect pass to deal with both old and new ways to
pass an argument to __nvvm_reflect.
Test Plan: ninja check-all
Reviewers: eliben, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D8399
llvm-svn: 232732
This should bring the windows bots back.
It is a bit ugly, but it is better than what we had before: The triple would
say that the object format was COFF, but llc/llvm-mc would produce an ELF.
llvm-svn: 232683
No outlining is necessary for SEH catch blocks. Use the blockaddr of the
handler in place of the usual outlined function.
Reviewers: majnemer, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D8370
llvm-svn: 232664
Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets.
This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware).
Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware.
Differential Revision: http://reviews.llvm.org/D8416
llvm-svn: 232660
When calculating the lanemask of a register class we have to include the
masks of subregisters supported by any of the class members, not just
the ones supported by all class members.
This fixes problems when coalescing towards a subclass with additional
subregisters available.
The attached testcase works fine as is, but does crash if you enable
subregister liveness on x86 without this change applied.
llvm-svn: 232652
The checks here were so vague that we could nuke intrinsics
from existence and still pass the test because we'd match
the function name.
llvm-svn: 232647
The 'vmovntdq' was only passing due to a fluke in
SandyBridge codegen that splits 32-byte stores in half,
but that meant that the test was not correctly checking
for the 32-byte store that we thought we were generating.
The lax checking in this file will be addressed in
another commit. There are bigger problems here.
llvm-svn: 232644
The two hot blocks are right next to each other and I verified that
there is no performance regression by compressing/uncompressing some
files with a minigzip built with the different options.
llvm-svn: 232629
Memcpy, and other memory intrinsics, typically tries to use LDM/STM if
the source and target addresses are 4-byte aligned. In CodeGenPrepare
look for calls to memory intrinsics and, if the object is on the
stack, 4-byte align it if it's large enough that we expect that memcpy
would want to use LDM/STM to copy it.
Differential Revision: http://reviews.llvm.org/D7908
llvm-svn: 232627
These BEXTR cases are a check for the 64-bit load form and two negative cases where the bitrange is non-contiguous. From a private patch equivalent to r189742/PR17028.
llvm-svn: 232580
Summary:
This change teaches isImpliedCond to infer things like "X sgt 0" => "X -
1 sgt -1". The `ConstantRange` class has the logic to do the heavy
lifting, this change simply gets ScalarEvolution to exploit that when
reasonable.
Depends on D8345
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8346
llvm-svn: 232576
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x). This saves an instruction on
architectures like X86 and POWER(64).
Differential Revision: http://reviews.llvm.org/D8350
llvm-svn: 232572
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.
Differential Revision: http://reviews.llvm.org/D8394
llvm-svn: 232570
Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64. This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling.
Reviewers: mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson
Differential Revision: http://reviews.llvm.org/D8369
llvm-svn: 232562
Summary:
COFF COMDATs (for selection kinds other than 'select any') require at
least one non-section symbol in the symbol table.
Satisfy this by morally enhancing the linkage from private to internal.
Reviewers: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8374
llvm-svn: 232539
Before this patch code wanting to create temporary labels for a given entity
(function, cu, exception range, etc) had to keep its own counter to have stable
symbol names.
createTempSymbol would still add a suffix to make sure a new symbol was always
returned, but it kept a single counter. Because of that, if we were to use
just createTempSymbol("cu_begin"), the label could change from cu_begin42 to
cu_begin43 because some other code started using temporary labels.
Simplify this by just keeping one counter per prefix and removing the various
specialized counters.
llvm-svn: 232535
Also, add several entries to vectorizable functions table, and
corresponding tests. The table isn't complete, it'll be populated later.
Review: http://reviews.llvm.org/D8131
llvm-svn: 232531
The input offset to needsFrameBaseReg is a negative value below the top of the
stack frame, but when converting to a positive offset from the bottom of the
stack frame this value was negated, causing the final offset to be too large
by twice the input offset's magnitude. Fix that by not negating the offset.
Patch by John Brawn
Differential Revision: http://reviews.llvm.org/D8316
llvm-svn: 232513
The experiments can be used to evaluate potential optimizations that remove
instrumentation (assess false negatives). Instead of completely removing
some instrumentation, you set Exp to a non-zero value (mask of optimization
experiments that want to remove instrumentation of this instruction).
If Exp is non-zero, this pass will emit special calls into runtime
(e.g. __asan_report_exp_load1 instead of __asan_report_load1). These calls
make runtime terminate the program in a special way (with a different
exit status). Then you run the new compiler on a buggy corpus, collect
the special terminations (ideally, you don't see them at all -- no false
negatives) and make the decision on the optimization.
The exact reaction to experiments in runtime is not implemented in this patch.
It will be defined and implemented in a subsequent patch.
http://reviews.llvm.org/D8198
llvm-svn: 232502
The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address.
llvm-svn: 232486
Summary:
This adds a MipsInstAlias which expands to XORi $reg,$reg,imm. For example, "xor $6, 0x3A" should be expanded to "xori $6, $6, 58".
This should work for all MIPS ISAs.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8284
llvm-svn: 232473
ARMv6K is another layer between ARMV6 and ARMV6T2. This is the LLVM
side of the changes.
ARMV6 family LLVM implementation.
+-------------------------------------+
| ARMV6 |
+----------------+--------------------+
| ARMV6M (thumb) | ARMV6K (arm,thumb) | <- From ARMV6K and ARMV6M processors
+----------------+--------------------+ have support for hint instructions
| ARMV6T2 (arm,thumb,thumb2) | (SEV/WFE/WFI/NOP/YIELD). They can
+-------------------------------------+ be either real or default to NOP.
| ARMV7 (arm,thumb,thumb2) | The two processors also use
+-------------------------------------+ different encoding for them.
Patch by Vinicius Tinti.
llvm-svn: 232468
Optimize concat_vectors of truncated vectors, where the intermediate
type is illegal, to avoid said illegality, e.g.,
(v4i16 (concat_vectors (v2i16 (truncate (v2i64))),
(v2i16 (truncate (v2i64)))))
->
(v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))),
(v2i32 (truncate (v2i64)))))))
This isn't really target-specific, and, as such, would best go in the
DAGCombiner. However, ISD::TRUNCATE legality isn't keyed on both input
and result type, so we might generate worse code when we don't know
better. On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8.
rdar://20022387
llvm-svn: 232459
Re-commit the test cases added in r232444. These now use
-irce-print-changed-loops and -irce-print-range-checks so they run
correctly on a without asserts build of llvm.
llvm-svn: 232452
I accidentally checked in two tests that used -debug-only -- these fail
on a release LLVM build. Temporarily delete these from the repo to keep
the bots green while I fix this locally.
llvm-svn: 232446
This change to IRCE gets it to recognize "half" range checks. Half
range checks are range checks that only either check if the index is
`slt` some positive integer ("length") or if the index is `sge` `0`.
The range solver does not try to be clever / aggressive about solving
half-range checks -- it transforms "I < L" to "0 <= I < L" and "0 <= I"
to "0 <= I < INT_SMAX". This is safe, but not always optimal.
llvm-svn: 232444
By default we want our gcov emission to stay 4.2 compatible, which
means we need to continue emit the exit block last by default. We add
an option to emit it before the body for users that need it.
llvm-svn: 232438
This makes the reader check the endianness of the object file its
given and behave appropriately. For the test I dug up a really old
linker and created a ppc-apple-darwin file for llvm-cov to read.
llvm-svn: 232422
We removed @llvm.eh.typeid.for.i32 and replaced it with
@llvm.eh.typeid.for quite some time ago. Fix up some test cases which
never got updated.
llvm-svn: 232421
(turns out I had regressed this when sinking handling of this type down
into GetElementPtrInst::Create - since that asserted before the error
handling was performed)
llvm-svn: 232420
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:
- Empty `filename:` fields in `MDFile`s. Compile units and some types
require non-empty filenames. A number of testcases have empty
filenames, probably due to hand-reduction of testcases.
- Not-quite empty arrays: `!{i32 0}`. This used to be equivalent in
the debug info schema to `!{}`. They cause problems for
`!MDSubroutineType`'s `types:` array, since it requires all operands
to be valid types. (Note that `!{null}` is the correct type array
for functions that take no arguments and return `void`.)
- Significantly bitrotted testcases. Nodes got left behind a few
upgrades ago because of missing or invalid tags.
llvm-svn: 232415
Turns out `visitIntrinsicFunctionCall()` descends into all operands
already, so explicitly descending in `visitDbgIntrinsic()` (part of
r232296) isn't useful.
Updating a testcase that doesn't really need `-verify-debug-info` (since
r231082) as confirmation.
llvm-svn: 232408
r230877 optimized which fields are written out for `CHECK`-ability, but
apparently missed changing some of them to optional in `LLParser`.
Fixes PR22921.
llvm-svn: 232400
Fix justify error for small structures bigger than 32 bits in fixed
arguments for MIPS64 big endian. There was a problem when small structures
are passed as fixed arguments. The structures that are bigger than 32 bits
but smaller than 64 bits were not left justified properly on MIPS64 big
endian. This is fixed by shifting the value to make it left justified when
appropriate.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D8174
llvm-svn: 232382
This still doesn't actually work correctly for big endian input files,
but since these tests all use little endian input files they don't
actually fail. I'll be committing a real fix for big endian soon, but
I don't have proper tests for it yet.
llvm-svn: 232354
The problem here is the infamous one direction known safe. I was
hesitant to turn it off before b/c of the potential for regressions
without an actual bug from users hitting the problem. This is that bug ;
).
The main performance impact of having known safe in both directions is
that often times it is very difficult to find two releases without a use
in-between them since we are so conservative with determining potential
uses. The one direction known safe gets around that problem by taking
advantage of many situations where we have two retains in a row,
allowing us to avoid that problem. That being said, the one direction
known safe is unsafe. Consider the following situation:
retain(x)
retain(x)
call(x)
call(x)
release(x)
Then we know the following about the reference count of x:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 1 (since we can not release a deallocated pointer).
release(x)
// rc(x) >= 0
That is all the information that we can know statically. That means that
we know that A(x), B(x) together can release (x) at most N+1 times. Lets
say that we remove the inner retain, release pair.
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
release(x)
// rc(x) >= 0
We knew before that A(x), B(x) could release x up to N+1 times meaning
that rc(x) may be zero at the release(x). That is not safe. On the other
hand, consider the following situation where we have a must use of
release(x) that x must be kept alive for after the release(x)**. Then we
know that:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
retain(x)
// rc(x) == N+2
call A(x)
call B(x)
// rc(x) >= 2 (since we know that we are going to release x and that that release can not be the last use of x).
release(x)
// rc(x) >= 1 (since we can not deallocate the pointer since we have a must use after x).
…
// rc(x) >= 1
use(x)
Thus we know that statically the calls to A(x), B(x) can together only
release rc(x) N times. Thus if we remove the inner retain, release pair:
// rc(x) == N (for some N).
retain(x)
// rc(x) == N+1
call A(x)
call B(x)
// rc(x) >= 1
…
// rc(x) >= 1
use(x)
We are still safe unless in the final … there are unbalanced retains,
releases which would have caused the program to blow up anyways even
before optimization occurred. The simplest form of must use is an
additional release that has not been paired up with any retain (if we
had paired the release with a retain and removed it we would not have
the additional use). This fits nicely into the ARC framework since
basically what you do is say that given any nested releases regardless
of what is in between, the inner release is known safe. This enables us to get
back the lost performance.
<rdar://problem/19023795>
llvm-svn: 232351
This code was casting regions of a memory buffer to a couple of
different structs. This is wrong in a few ways:
1. It breaks aliasing rules.
2. If the buffer isn't aligned, it hits undefined behaviour.
3. It completely ignores endianness differences.
4. The structs being defined for this aren't specifying their padding
properly, so this doesn't even represent the data properly on some
platforms.
This commit is mostly NFC, except that it fixes reading coverage for
32 bit binaries as a side effect of getting rid of the mispadded
structs. I've included a test for that.
I've also baked in that we only handle little endian more explicitly,
since that was true in practice already. I'll fix this to handle
endianness properly in a followup commit.
llvm-svn: 232346
The information gathering part of the patch stores a bit more information
than what is strictly necessary for these 2 sections. The rest will
become useful when we start emitting __apple_* type accelerator tables.
llvm-svn: 232342
This code comes with a lot of cruft that is meant to mimic darwin's
dsymutil behavior. A much simpler approach (described in the numerous
FIXMEs that I put in there) gives the right output for the vast
majority of cases. The extra corner cases that are handled differently
need to be investigated: they seem to correctly handle debug info that
is in the input, but that info looks suspicious in the first place.
Anyway, the current code needs to handle this, but I plan to revisit it
as soon as the big round of validation against the classic dsymutil is
over.
llvm-svn: 232333
The debug map embedded by ld64 in binaries conatins function sizes.
These sizes are less precise than the ones given by the debug information
(byte granularity vs linker atom granularity), but they might cover code
that is referenced in the line table but not in the DIE tree (that might
very well be a compiler bug that I need to investigate later).
Anyway, extracting that information is necessary to be able to mimic
dsymutil's behavior exactly.
llvm-svn: 232300
Verify that debug info intrinsic arguments are valid. (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)
With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).
Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.
If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful. The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).
llvm-svn: 232296
This test for function-local metadata did strange things, and never
really sent in valid arguments for `llvm.dbg.declare` and
`llvm.dbg.value` intrinsics. Those that might have once been valid have
bitrotted.
Rewrite it to be a targeted test for function-local metadata --
unrelated to debug info, which is tested elsewhere -- and rename it to
better match other metadata-related tests.
(Note: the scope of function-local metadata changed drastically during
the metadata/value split, but I didn't properly clean up this testcase.
Most of the IR in this file, while invalid for debug info intrinsics,
used to provide coverage for various (now illegal) forms of
function-local metadata.)
llvm-svn: 232290
Summary: This is a first step toward getting proper support for aggregate loads and stores.
Test Plan: Added unittests
Reviewers: reames, chandlerc
Reviewed By: chandlerc
Subscribers: majnemer, joker.eph, chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D7780
Patch by Amaury Sechet
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
There is no need to look into the location expressions to transfer them,
the only modification to apply is to patch their base address to reflect
the linked function address.
llvm-svn: 232267
Specifically, if there are copy-like instructions in the loop header
they are moved into the loop close to their uses. This reduces the live
intervals of the values and can avoid register spills.
This is working towards a fix for http://llvm.org/PR22230.
Review: http://reviews.llvm.org/D7259
Next steps:
- Find a better cost model (which non-copy instructions should be sunk?)
- Make this dependent on register pressure
llvm-svn: 232262
This actually shares most of its implementation with the generation
of the debug_ranges (the absence of 'a' is not a typo) contribution
for the unit's DW_AT_ranges attribute.
llvm-svn: 232246
The linker on that platform may re-order symbols or strip dead symbols, which
will break bit set checks. Avoid this by hiding the symbols from the linker.
llvm-svn: 232235
Nothing fancy, just a straightforward offset to apply to the original
debug_ranges entries to get them in line with the linked addresses.
llvm-svn: 232232
This fixes pr22854.
The core issue on the bug is that there are multiple instructions that
print the same in assembly. In fact, there doesn't seem to be any
syntax for specifying that a constant that fits in 8 bits should use a 32 bit
immediate.
The attached patch changes fast isel to consider i16immSExt8,
i32immSExt8, and i64immSExt8. They were disabled because fastisel didn’t know
to call the predicate back in the day.
llvm-svn: 232223
This reapplies the patch previously committed at revision 232190. This was
reverted at revision 232196 as it caused test failures in tests that did not
expect operands to be commuted. I have made the tests more resilient to
reassociation in revision 232206.
llvm-svn: 232209
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`. Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often. Nevertheless, it's
a cheap check and a nice cleanup.
llvm-svn: 232202
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.
The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)
The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`. Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.
I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations. (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)
An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.
A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.
rdar://problem/20075773
llvm-svn: 232200
We recorded the forward references in the CU that holds the referenced
DIE, but this is wrong as those will get resoled *after* the CU that
holds the reference. Record the references in their originating CU along
with a pointer to the remote CU to be able to compute the fixed up
offset at the right time.
llvm-svn: 232193
The typo got unnoticed because we were testing only on Dwarf 2. Add a
Dwarf4 test that exercises the code path, and also tests some newer
FORMs that the other test doesn't cover.
llvm-svn: 232191
This patch adds initial support for vector instructions to the reassociation
pass. It enables most parts of the pass to work with vectors but to keep the
size of the patch small, optimization of Xor trees, canonicalization of
negative constants and converting shifts to muls, etc., have been left out.
This will be handled in later patches.
The patch is based on an initial patch by Chad Rosier.
Differential Revision: http://reviews.llvm.org/D7566
llvm-svn: 232190
Summary:
ScalarEvolutionExpander assumes that the header block of a loop is a
legal place to have a use for a phi node. This is true only for phis
that are either in the header or dominate the header block, but it is
not true for phi nodes that are strictly internal to the loop body.
This change teaches ScalarEvolutionExpander to place uses of PHI nodes
in the basic block the PHI nodes belong to. This is always legal, and
`hoistIVInc` ensures that the said position dominates `IsomorphicInc`.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8311
llvm-svn: 232189
Similar to gep (r230786) and load (r230794) changes.
Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.
(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)
import fileinput
import sys
import re
rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)
def conv(match):
line = match.group(1)
line += match.group(4)
line += ", "
line += match.group(2)
return line
line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
sys.stdout.write(line[off:match.start()])
sys.stdout.write(conv(match))
off = match.end()
sys.stdout.write(line[off:])
llvm-svn: 232184
using numeric values and not their symbolic constant names.
The routines that print Mach-O stuff already had a verbose parameter and this
change is just changing the passing true to passing !NonVerbose. With just a
couple of fixes and a bunch of test case updates.
llvm-svn: 232182
This patch fixes a bug in the shuffle lowering logic implemented by function
'lowerV2X128VectorShuffle'.
The are few cases where function 'lowerV2X128VectorShuffle' wrongly expands a
shuffle of two v4X64 vectors into a CONCAT_VECTORS of two EXTRACT_SUBVECTOR
nodes. The problematic expansion only occurs when the shuffle mask M has an
'undef' element at position 2, and M is equivalent to mask <0,1,4,5>.
In that case, the algorithm propagates the wrong vector to one of the two
new EXTRACT_SUBVECTOR nodes.
Example:
;;
define <4 x double> @test(<4 x double> %A, <4 x double> %B) {
entry:
%0 = shufflevector <4 x double> %A, <4 x double> %B, <4 x i32><i32 undef, i32 1, i32 undef, i32 5>
ret <4 x double> %0
}
;;
Before this patch, llc (-mattr=+avx) generated:
vinsertf128 $1, %xmm0, %ymm0, %ymm0
With this patch, llc correctly generates:
vinsertf128 $1, %xmm1, %ymm0, %ymm0
Added test lower-vec-shuffle-bug.ll
Differential Revision: http://reviews.llvm.org/D8259
llvm-svn: 232179
Constant folding for shift IR instructions ignores all bits above 32 of
second argument (shift amount).
Because of that, some undef results are not recognized and APInt can
raise an assert failure if second argument has more than 64 bits.
Patch by Paweł Bylica!
Differential Revision: http://reviews.llvm.org/D7701
llvm-svn: 232176
%Q5_Q6<def> = COPY %Q2_Q3
%D5<def> =
%D3<def> =
%D3<def> = COPY %D6 // Incorrectly removed in MachineCopyPropagation
Using of %D3 results in incorrect result ...
Reviewed in http://reviews.llvm.org/D8242
llvm-svn: 232142
There's a missed optimization opportunity where we could look at the full chain of computation and take the intersection of the flags instead of only looking one instruction deep.
llvm-svn: 232134
This should complete the job started in r231794 and continued in r232045:
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.
This should complete the removal of insert/extract128 intrinsics.
The Clang precursor patch for this change was checked in at r232109.
llvm-svn: 232120
Instead print them as part of the $dst operand. The AsmMatcher
requires the 32-bit and 64-bit encodings have the same mnemonic in
order to parse them correctly.
llvm-svn: 232105
The permps and permd instructions have their operands swapped compared to the
intrinsic definition. Therefore, they do not fall into the INTR_TYPE_2OP
category.
I did not create a new category for those two, as they are the only one AFAICT
in that case.
<rdar://problem/20108262>
llvm-svn: 232085
Part of the folding logic implemented by function 'PerformISDSETCCCombine'
only worked under the assumption that the condition code in input could have
been either SETNE or SETEQ.
Unfortunately that assumption was incorrect, and in some cases the algorithm
ended up incorrectly folding SETCC nodes.
The incorrect folding only affected SETCC dag nodes where:
- one of the operands was a build_vector of all zeroes;
- the other operand was a SIGN_EXTEND from a vector of MVT:i1 elements;
- the condition code was neither SETNE nor SETEQ.
Example:
(setcc (v4i32 (sign_extend v4i1:%A)), (v4i32 VectorOfAllZeroes), setge)
Before this patch, the entire dag node sequence from the example was
incorrectly folded to node %A.
With this patch, the dag node sequence is folded to a
(xor %A, (v4i1 VectorOfAllOnes)).
Added test setcc-combine.ll.
Thanks to Greg Bedwell for spotting this issue.
llvm-svn: 232046
Now that we've replaced the vinsertf128 intrinsics,
do the same for their extract twins.
This is very much like D8086 (checked in at r231794):
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
This is also the LLVM sibling to the cfe D8275 patch.
Differential Revision: http://reviews.llvm.org/D8276
llvm-svn: 232045
It's firstly committed at r231630, and reverted at r231635.
Function pass InstructionSimplifier is inserted as barrier to
make sure loop unroll pass won't affect on LICM pass.
llvm-svn: 232011
Instead, run both EH preparation passes, and have them both ignore
functions with unrecognized EH personalities. Pass delegation involved
some hacky code for creating an AnalysisResolver that we don't need now.
llvm-svn: 231995
CodeGen incorrectly ignores (assert from APInt) constant index bigger
than 2^64 in getelementptr instruction. This is a test and fix for that.
Patch by Paweł Bylica!
Reviewed By: rnk
Subscribers: majnemer, rnk, mcrosier, resistor, llvm-commits
Differential Revision: http://reviews.llvm.org/D8219
llvm-svn: 231984
If a function is going in an unique section (because of -ffunction-sections
for example), putting a jump table in .rodata will keep .rodata alive and
that will keep alive any other function that also has a jump table.
Instead, put the jump table in a unique section that is associated with the
function.
llvm-svn: 231961
The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.
But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.
I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.
rdar://20095672
llvm-svn: 231959
DW_AT_low_pc on functions is taken care of by the relocation processing, but
DW_AT_high_pc and DW_AT_low_pc on other lexical scopes need special handling.
llvm-svn: 231955
This is a follow-up to r231182. This adds the "vbroadcasti128" instruction
back, but without the intrinsic mapping. Also add a test to check the
instriction encoding.
This is related to rdar://problem/18742778.
llvm-svn: 231945
Summary:
The generic ELF TargetObjectFile defaults to .ctors, but Linux's
defaults to .init_array by calling InitializeELF with the value of
UseInitArray from TargetMachine. Make NaCl's behavior match.
Reviewers: jvoung
Differential Revision: http://reviews.llvm.org/D8240
llvm-svn: 231934
The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics.
This patch makes sure that the Inliner does not add such an edge to the callgraph.
Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857
Differential Revision: http://reviews.llvm.org/D8231
llvm-svn: 231927
As of r231908, the test I added in r231902 actually gets run - but I'd
checked in a stale version of the input so it didn't pass. Fix the
input and un-xfail the test.
llvm-svn: 231911
This causes a crash if the referenced intrinsic was malformed. In this case, we
would already have reported an error on the referenced intrinsic, but then
crashed on the second one when it tried to introspect the first without
error checking.
llvm-svn: 231910
There were also errors in the CHECK line which I fixed and the test
doesn't actually pass as the "100" is in the wrong line. Not sure
whether this is a test failure or a coverage failure so making the test
XFAIL for now.
llvm-svn: 231908
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.
I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.
p.s. We should probably do the same thing for switch instructions.
Differential Revision: http://reviews.llvm.org/D8220
llvm-svn: 231881
There are still 4 tests that check for DW_AT_MIPS_linkage_name,
because they specify DWARF 2 or 3 in the module metadata. So, I didn't
create an explicit version-based test for the attribute.
Differential Revision: http://reviews.llvm.org/D8227
llvm-svn: 231880
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.
Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases:
* Early exits from functions
* Early exits from loops (for context instructions in the loop and after the check)
* Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)
Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.
This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.
Differential Revision: http://reviews.llvm.org/D7708
llvm-svn: 231879
- Use TargetLowering to check for the actual cost of each extension.
- Provide a factorized method to check for the cost of an extension:
TargetLowering::isExtFree.
- Provide a virtual method TargetLowering::isExtFreeImpl for targets to be able
to tune the cost of non-free extensions.
This refactoring offers a better granularity to model what really happens on
different targets.
No performance changes and very few code differences.
Part of <rdar://problem/19267165>
llvm-svn: 231855
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
rdar://20044838
llvm-svn: 231840
Follow up from r231505.
Fix the non-determinism by using a MapVector and reintroduce the AArch64
testcase. Defer deleting the got candidates up to the end and remove
them in a bulk, avoiding linear time removal of each element.
Thanks to Renato Golin for trying it out on other platforms.
llvm-svn: 231830
The dependences are now expose through the new getInterestingDependences
API so we can use that with -analyze too and fix the FIXME.
This lets us remove the test that relied on -debug to check the
dependences.
llvm-svn: 231807
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
(Resubmitting this change after not being able to reproduce buildbot failure)
Differential Revision: http://reviews.llvm.org/D7760
llvm-svn: 231800
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.
This is the sibling patch for the Clang half of this change:
http://reviews.llvm.org/D8088
Differential Revision: http://reviews.llvm.org/D8086
llvm-svn: 231794
This crash occurs due to memory corruption when trying to update dependency
direction based on Constraints.
This crash was observed during lnt regression of Polybench benchmark test case dynprog.
Review: http://reviews.llvm.org/D8059
llvm-svn: 231788
This crash in Dependency analysis is because we assume here that in case of UsefulGEP
both source and destination have the same number of operands which may not be true.
This incorrect assumption results in crash while populating Pairs. Fix the same.
This crash was observed during lnt regression for code such as-
struct s{
int A[10][10];
int C[10][10][10];
} S;
void dep_constraint_crash_test(int k,int N) {
for( int i=0;i<N;i++)
for( int j=0;j<N;j++)
S.A[0][0] = S.C[0][0][k];
}
Review: http://reviews.llvm.org/D8162
llvm-svn: 231784
We failed to use a marking set to properly handle recursive types, which caused use
to recurse infinitely and eventually overflow the stack.
llvm-svn: 231760
In this situation we would always have already flagged an error on the statepoint intrinsic,
but then we carry on to parse other, related GC intrinsics, and could end up crashing during that
verification when they try to access data from the malformed statepoint.
llvm-svn: 231759
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program. Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.
llvm-svn: 231755
They mark the start of a compile unit, so name them .Lcu_*. Using
Section->getLabelBeginName() makes it looks like they mark the start of the
section.
While at it, switch to createTempSymbol to avoid collisions with labels
created in inline assembly. Not sure if a "don't crash" test is worth it.
With this getLabelBeginName is dead, delete it.
llvm-svn: 231750
CFLAA didn't know how to properly handle ConstantExprs; it would silently
ignore them. This was a problem if the ConstantExpr is, say, a GEP of a global,
because CFLAA wouldn't realize that there's a global there. :)
llvm-svn: 231743
We now treat pointers given to ptrtoint and pointers retrieved from
inttoptr as similar to arguments or globals (can alias anything, etc.)
This solves some of the problems we were having with giving incorrect
results.
llvm-svn: 231741
It turns out accelerator tables where totally broken if they contained
entries with colliding hashes. The failure mode is pretty bad, as it not
only impacted the colliding entries, but would basically make all the
entries after the first hash collision pointing in the wrong place.
The testcase uses the symbol names that where found to collide during a
clang build.
From a performance point of view, the patch adds a sort and a linear
walk over each bucket contents. While it has a measurable impact on the
accelerator table emission, it's not showing up significantly in clang
profiles (and I'd argue that correctness is priceless :-)).
llvm-svn: 231732
This fixes a subtle issue that was introduced in r205153.
When reusing a store for the extractelement expansion (to load directly
from it, inserting of going through the stack), later stores to the
same location might have overwritten the data we were expecting to
extract from.
To fix that, we need to explicitly replace the chain going out of the
reused store, so that later stores also have an explicit dependency on
the generated element-extracting loads, and can't clobber them.
rdar://20066785
Differential Revision: http://reviews.llvm.org/D8180
llvm-svn: 231721
Fix the double-deletion of AnalysisResolver when delegating through to
Dwarf EH preparation by creating one from scratch. Hopefully the new
pass manager simplifies this.
This reverts commit r229952.
llvm-svn: 231719
Summary:
This removes some duplicated code, and also helps optimization: e.g. in
the test case added, `%idx ULT 128` in `@x` is not currently optimized
to `true` by `-indvars` but will be, after this change.
The only functional change in ths commit is that for add recurrences,
ScalarEvolution::getRange will be more aggressive -- computing the
unsigned (resp. signed) range for a SCEVAddRecExpr will now look at the
NSW (resp. NUW) bits and check for signed (resp. unsigned) overflow.
This can be a strict improvement in some cases (such as the attached
test case), and should be no worse in other cases.
Reviewers: atrick, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8142
llvm-svn: 231709
Summary:
Unused in this commit, but will be used in a subsequent change (D8142)
by a FileCheck test.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8143
llvm-svn: 231708
In the case where just tables are part of the function section, this produces
more readable assembly by avoiding switching to the eh section and back
to .text.
This would also break with non unique section names, as trying to switch to
a unique section actually creates a new one.
llvm-svn: 231677
Summary:
Code is mostly copied from AArch64 port and modified where needed for Mips.
This handles the "non" legal cases of logical ops. Legal cases are handled by tablegen patterns.
Test Plan:
Make check test logopm.ll
All of test-suite passes at O0/O2 and mips32 r1/r2 with this new change.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: echristo, llvm-commits, aemerson, rfuhler
Differential Revision: http://reviews.llvm.org/D6599
llvm-svn: 231665
Fixing this also exposed a related issue where the landingpad under construction was not
cleaned up when an error was raised, which would cause bad reference errors before the
error could actually be printed.
llvm-svn: 231634
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.
llvm-svn: 231632
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.
llvm-svn: 231631
Runtime unrollng will introduce a runtime check in loop prologue.
If the unrolled loop is a inner loop, then the proglogue will be inside
the outer loop. LICM pass can help to promote the runtime check out if
the checked value is loop invariant.
llvm-svn: 231630
Summary:
See the two test cases.
; Can fold fcmp with undef on one side by choosing NaN for the undef
; Can fold fcmp with undef on both side
; fcmp u_pred undef, undef -> true
; fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef
Reviewers: hfinkel, chandlerc, majnemer
Reviewed By: majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D7617
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231626
There were cases where the backend computed a wrong permute mask for a VPERM2X128 node.
Example:
\code
define <8 x float> @foo(<8 x float> %a, <8 x float> %b) {
%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7>
ret <8 x float> %shuffle
}
\code end
Before this patch, llc (with -mattr=+avx) emitted the following vperm2f128:
vperm2f128 $0, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[0,1,0,1]
With this patch, llc emits a vperm2f128 with a correct permute mask:
vperm2f128 $17, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[2,3,2,3]
Differential Revision: http://reviews.llvm.org/D8119
llvm-svn: 231601
Provide basic support for dynamically loadable coff objects. Only handles a subset of x64 currently.
Patch by Andy Ayers!
Differential Revision: http://reviews.llvm.org/D7793
llvm-svn: 231574
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))
An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.
Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.
Added X86 test and-load-fold.ll
Differential Revision: http://reviews.llvm.org/D8085
llvm-svn: 231563
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.
This prevents many cases of spilling scalar data between the gpr + simd registers.
At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).
Differential Revision: http://reviews.llvm.org/D8132
llvm-svn: 231554
Doing this gets function's low_pc and global variable's locations right
in the output debug info. It also could get right other attributes
that need to be relocated (in linker terms), but I don't know of any
other than the address attributes.
This doesn't fixup low_pc attributes in compile_unit, lexical_block
or inlined subroutine, nor does it get right high_pc attributes
for function. This will come in a subsequent commit.
llvm-svn: 231544
to disable lane switching if we don't actually have the instruction
set we want to switch to. Models the earlier check above the
conditional for the pass.
The testcase is one that triggered with the assert that's added
as part of the fix, use it to avoid adding a new testcase as it
highlights the same problem.
llvm-svn: 231539
Reference attributes are mainly handled by just creating DIEEntry
attributes for them. There is a special case for DW_FORM_ref_addr
attributes though, because the DIEEntry code needs a DwarfDebug
code to emit them (and we don't have one as we do no CodeGen).
In that case, just use DIEInteger attributes with the right form.
llvm-svn: 231531
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!
<rdar://problem/20072968>
llvm-svn: 231527
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))
Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.
Differential Revision: http://reviews.llvm.org/D7622
llvm-svn: 231507
The checking for extgotequiv and localgotequiv rely on the emission
order, which is not guaranteed because we use DenseMap to hold the GOT
equivalents. XFAIL this now until I get time to use MapVector and test
out the solution. In the meantime, appease buildbots.
llvm-svn: 231497
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.
-- before
_extgotequiv:
.long _extfoo
_delta:
.long _extgotequiv-_delta
-- after
_delta:
.long L_extfoo$non_lazy_ptr-_delta
.section __IMPORT,__pointers,non_lazy_symbol_pointers
L_extfoo$non_lazy_ptr:
.indirect_symbol _extfoo
.long 0
llvm-svn: 231475
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:
-- before
.globl _foo
_foo:
.long 42
.globl _gotequivalent
_gotequivalent:
.quad _foo
.globl _delta
_delta:
.long _gotequivalent-_delta
-- after
.globl _foo
_foo:
.long 42
.globl _delta
Ltmp3:
.long _foo@GOT-Ltmp3
llvm-svn: 231474
Summary:
None of the .set directives can be used before the .module directives. The .set mips0/pop/push were not triggering this constraint.
Also added testing for all the other implemented directives which are supposed to trigger this constraint.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7140
llvm-svn: 231465
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458
We supported forming IMGREL relocations from ConstantExprs involving
__ImageBase if the minuend was a GlobalVariable. Extend this
functionality to all GlobalObjects.
llvm-svn: 231456
We would set the body of a struct type (therefore making it non-opaque)
but were forgetting to move it to the non-opaque set.
Fixes pr22807.
llvm-svn: 231442
At this point, we should have decent coverage of the involved code. I've got a few more test cases to cleanup and submit, but what's here is already reasonable.
I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days. Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage.
llvm-svn: 231414
This patch reduces code size for all AVX targets and increases speed for some chips.
SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and
only in the packed float/double flavors.
AVX subsequently made the instruction useful by adding a 4-register operand form.
So we just need to paper over the lack of scalar forms of this instruction, complicate
the code to choose float or double forms, and use blendv on scalars since all FP is in
xmm registers anyway.
This gives us an approximately 50% speed up for a blendv microbenchmark sequence
on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter
No new test cases with this patch because:
1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.
http://llvm.org/bugs/show_bug.cgi?id=22483
Differential Revision: http://reviews.llvm.org/D8063
llvm-svn: 231408
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.
Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because:
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
when we "manually expand" extloads to vld+vmov.
For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.
Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423
llvm-svn: 231396
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).
This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.
Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.
Differential Revision: http://reviews.llvm.org/D7939
llvm-svn: 231380
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.
llvm-svn: 231371
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
llvm-svn: 231366
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.
Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053
llvm-svn: 231359
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).
This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.
llvm-svn: 231332
Improve test robustness in preparation of coming commits:
- Avoid undefs which may get propagated too much.
- Remove several pointless add 0, instructions
llvm-svn: 231307