LoopRotate wanted to avoid live range interference by looking at the
uses of a Value in the loop latch and seeing if any lied outside of the
loop. We would wrongly perform this operation on Constants.
This fixes PR22337.
llvm-svn: 227171
r227148 added test CommandLineTest.HideUnrelatedOptionsMulti which repeatedly
outputs two following lines:
-tool: CommandLine Error: Option 'test-option-1' registered more than once!
-tool: CommandLine Error: Option 'test-option-2' registered more than once!
r227154 depends on changes from r227148
llvm-svn: 227167
object that manages a single run of this pass.
This was already essentially how it worked. Within the run function, it
would point members at *stack local* allocations that were only live for
a single run. Instead, it seems much cleaner to have a utility object
whose lifetime is clearly bounded by the run of the pass over the
function and can use member variables in a more direct way.
This also makes it easy to plumb the analyses used into it from the pass
and will make it re-usable with the new pass manager.
No functionality changed here, its just a refactoring.
llvm-svn: 227162
For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can.
Differential Revision: http://reviews.llvm.org/D7178
llvm-svn: 227145
This is especially useful for the UTF8 -> UTF16 direction, since
there is no equivalent of llvm::SmallString<> for wide characters.
This means that anyone who wants a null terminated string is forced
to manually push and pop their own null terminator.
Reviewed by: Reid Kleckner.
llvm-svn: 227143
Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask.
Differential Revision: http://reviews.llvm.org/D7180
llvm-svn: 227141
Need a new API for clang-modernize that allows specifying a list of option categories to remain visible. This will allow clang-modernize to move off getRegisteredOptions.
llvm-svn: 227140
An unreachable default destination can be exploited by other optimizations and
allows for more efficient lowering. Both the SDag switch lowering and
LowerSwitch can exploit unreachable defaults.
Also make TurnSwitchRangeICmp handle switches with unreachable default.
This is kind of separate change, but it cannot be tested without the change
above, and I don't want to land the change above without this since that would
regress other tests.
Differential Revision: http://reviews.llvm.org/D6471
llvm-svn: 227125
Instead of creating a pattern like "(p && a) || ((!p) && b)",
just expand the i8 operands to i32 and perform the selp on them.
Fixes PR22246
llvm-svn: 227123
This can also be used instead of the WindowsSupport.h ConvertUTF8ToUTF16
helpers, but that will require massaging some character types. The
Windows support routines want wchar_t output, but wchar_t is often 32
bits on non-Windows OSs.
llvm-svn: 227122
derived classes.
Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.
*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.
llvm-svn: 227113
According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile.
The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes. I will be submitting a PRE w/volatiles test case seperately in the near future.
Differential Revision: http://reviews.llvm.org/D6901
llvm-svn: 227112
This patch fixes the following miscompile:
define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
%0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind
%a0 = extractelement <2 x double> %0, i32 0
%conv = fptrunc double %a0 to float
%a1 = extractelement <2 x double> %0, i32 1
%conv3 = fptrunc double %a1 to float
tail call void @callee2(float %conv, float %conv3) nounwind
ret void
}
Current codegen:
sqrtsd %xmm0, %xmm1 ## high element of %xmm1 is undef here
xorps %xmm0, %xmm0
cvtsd2ss %xmm1, %xmm0
shufpd $1, %xmm1, %xmm1
cvtsd2ss %xmm1, %xmm1 ## operating on undef value
jmp _callee
This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 )
which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 );
this should be the final patch needed to resolve that bug.
Differential Revision: http://reviews.llvm.org/D6885
llvm-svn: 227111
This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block.
Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that.
Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself.
Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well.
Differential Revision: http://reviews.llvm.org/D6895
llvm-svn: 227110
This change reverts the interesting parts of 226311 (and 227046). This change introduced two problems, and I've been convinced that an alternate approach is preferrable anyways.
The bugs were:
- Registery appears to require all users be within the same linkage unit. After this change, asking for "statepoint-example" in Transform/ would sometimes get you nullptr, whereas asking the same question in CodeGen would return the right GCStrategy. The correct long term fix is to get rid of the utter hack which is Registry, but I don't have time for that right now. 227046 appears to have been an attempt to fix this, but I don't believe it does so completely.
- GCMetadataPrinter::finishAssembly was being called more than once per GCStrategy. Each Strategy was being added to the GCModuleInfo multiple times.
Once I get time again, I'm going to split GCModuleInfo into the gc.root specific part and a GCStrategy owning Analysis pass. I'm probably also going to kill off the Registry. Once that's done, I'll move the new GCStrategyAnalysis and all built in GCStrategies into Analysis. (As original suggested by Chandler.) This will accomplish my original goal of being able to access GCStrategy from Transform/ without adding all of the builtin GCs to IR/.
llvm-svn: 227109
Previously using format_hex() would always print a 0x prior to the
hex characters. This allows this to be optional, so that one can
choose to print (e.g.) 255 as either 0xFF or just FF.
Differential Revision: http://reviews.llvm.org/D7151
llvm-svn: 227108
than on MipsSubtargetInfo.
This required a bit of massaging in the MC level to handle this since
MC is a) largely a collection of disparate classes with no hierarchy,
and b) there's no overarching equivalent to the TargetMachine, instead
only the subtarget via MCSubtargetInfo (which is the base class of
TargetSubtargetInfo).
We're now storing the ABI in both the TargetMachine level and in the
MC level because the AsmParser and the TargetStreamer both need to
know what ABI we have to parse assembly and emit objects. The target
streamer has a pointer to the one in the asm parser and is updated
when the asm parser is created. This is fragile as the FIXME comment
notes, but shouldn't be a problem in practice since we always
create an asm parser before attempting to emit object code via the
assembler. The TargetMachine now contains the ABI so that the DataLayout
can be constructed dependent upon ABI.
All testcases have been updated to use the -target-abi command line
flag so that we can set the ABI without using a subtarget feature.
Should be no change visible externally here.
llvm-svn: 227102
Summary:
This puts all the options that CommandLine.cpp implements into a category so that the APIs to hide options can not hide based on the generic category instead of string matching a partial list of argument strings.
This patch is pretty simple and straight forward but it does impact the -help output of all tools using cl::opt. Specifically the options implemented in CommandLine.cpp (help, help-list, help-hidden, help-list-hidden, print-options, print-all-options, version) are all grouped together into an Option category, and these options are never hidden by the cl::HideUnrelatedOptions API.
Reviewers: dexonsmith, chandlerc, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7150
llvm-svn: 227093
Summary:
This patch adds support for some operations that were missing from
128-bit integer types (add/sub/mul/sdiv/udiv... etc.). With these
changes we can support the __int128_t and __uint128_t data types
from C/C++.
Depends on D7125
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7143
llvm-svn: 227089
suffix it seems:
# ./config.guess
earmv7hfeb-unknown-netbsd7.99.4
Extend the triple parsing to support this. Avoid running the ARM parser
multiple times because StringSwitch is not lazy.
Reviewers: Renato Golin, Tim Northover
Differential Revision: http://reviews.llvm.org/D7166
llvm-svn: 227085
This reverts commit r227003. Support for addition/subtraction and
various other operations for the i128 data type will be added in a
future commit based on the review D7143.
llvm-svn: 227082
-no-exec-stack. This was due to it not deriving from the correct
asm info base class and missing the override for the exec
stack section query. Added another line to the noexec test
line to make sure this doesn't regress.
llvm-svn: 227074
physical register that is described in a DBG_VALUE.
In the testcase the DBG_VALUE describing "p5" becomes unavailable
because the register its address is in is clobbered and we (currently)
aren't smart enough to realize that the value is rematerialized immediately
after the DBG_VALUE and/or is actually a stack slot.
llvm-svn: 227056
Test by Nemanja Ivanovic.
Since ppc64le implies POWER8 as a minimum, it makes sense that the
same features are included. Since the pwr8 processor model will likely
be getting new features until the implementation is complete, I
created a new list to add these updates to. This will include them in
both pwr8 and ppc64le.
Furthermore, it seems that it would make sense to compose the feature
lists for other processor models (pwr3 and up). Per discussion in the
review, I will make this change in a subsequent patch.
In order to test the changes, I've added an additional run step to
test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu
option. Since the feature lists are the same, the behaviour should be
unchanged.
llvm-svn: 227053
MIPS64 ELF file has a very specific relocation record format. Each
record might specify up to three relocation operations. So the `r_info`
field in fact consists of three relocation type sub-fields and optional
code of "special" symbols.
http://techpubs.sgi.com/library/manuals/4000/007-4658-001/pdf/007-4658-001.pdf
page 40
The patch implements support of the MIPS64 relocation record format in
yaml2obj/obj2yaml tools by introducing new optional Relocation fields:
Type2, Type3, and SpecSym. These fields are recognized only if the
object/YAML file relates to the MIPS64 target.
Differential Revision: http://reviews.llvm.org/D7136
llvm-svn: 227044
- Added KSHIFTB/D/Q for skx
- Added KORTESTB/D/Q for skx
- Fixed store operation for v8i1 type for KNL
- Store size of v8i1, v4i1 and v2i1 are changed to 8 bits
llvm-svn: 227043
If two coverage segments cover the same area we need to combine them,
as per r218432. OTOH, just because they start at the same place
doesn't mean they cover the same area. This fixes the check to be more
exact about this.
This is pretty hard to test right now. The frontend doesn't currently
emit regions that start at the same place but don't overlap, but some
upcoming work changes this.
llvm-svn: 227017
Summary:
V8->V9:
- cleanup tests
V7->V8:
- addressed feedback from David:
- switched to range-based 'for' loops
- fixed formatting of tests
V6->V7:
- rebased and adjusted AsmPrinter args
- CamelCased .td, fixed formatting, cleaned up names, removed unused patterns
- diffstat: 3 files changed, 203 insertions(+), 227 deletions(-)
V5->V6:
- addressed feedback from Chandler:
- reinstated full verbose standard banner in all files
- fixed variables that were not in CamelCase
- fixed names of #ifdef in header files
- removed redundant braces in if/else chains with single statements
- fixed comments
- removed trailing empty line
- dropped debug annotations from tests
- diffstat of these changes:
46 files changed, 456 insertions(+), 469 deletions(-)
V4->V5:
- fix setLoadExtAction() interface
- clang-formated all where it made sense
V3->V4:
- added CODE_OWNERS entry for BPF backend
V2->V3:
- fix metadata in tests
V1->V2:
- addressed feedback from Tom and Matt
- removed top level change to configure (now everything via 'experimental-backend')
- reworked error reporting via DiagnosticInfo (similar to R600)
- added few more tests
- added cmake build
- added Triple::bpf
- tested on linux and darwin
V1 cover letter:
---------------------
recently linux gained "universal in-kernel virtual machine" which is called
eBPF or extended BPF. The name comes from "Berkeley Packet Filter", since
new instruction set is based on it.
This patch adds a new backend that emits extended BPF instruction set.
The concept and development are covered by the following articles:
http://lwn.net/Articles/599755/http://lwn.net/Articles/575531/http://lwn.net/Articles/603983/http://lwn.net/Articles/606089/http://lwn.net/Articles/612878/
One of use cases: dtrace/systemtap alternative.
bpf syscall manpage:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4fc1a460f3017e958e6a8ea560ea0afd91bf6fe
instruction set description and differences vs classic BPF:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt
Short summary of instruction set:
- 64-bit registers
R0 - return value from in-kernel function, and exit value for BPF program
R1 - R5 - arguments from BPF program to in-kernel function
R6 - R9 - callee saved registers that in-kernel function will preserve
R10 - read-only frame pointer to access stack
- two-operand instructions like +, -, *, mov, load/store
- implicit prologue/epilogue (invisible stack pointer)
- no floating point, no simd
Short history of extended BPF in kernel:
interpreter in 3.15, x64 JIT in 3.16, arm64 JIT, verifier, bpf syscall in 3.18, more to come in the future.
It's a very small and simple backend.
There is no support for global variables, arbitrary function calls, floating point, varargs,
exceptions, indirect jumps, arbitrary pointer arithmetic, alloca, etc.
From C front-end point of view it's very restricted. It's done on purpose, since kernel
rejects all programs that it cannot prove safe. It rejects programs with loops
and with memory accesses via arbitrary pointers. When kernel accepts the program it is
guaranteed that program will terminate and will not crash the kernel.
This patch implements all 'must have' bits. There are several things on TODO list,
so this is not the end of development.
Most of the code is a boiler plate code, copy-pasted from other backends.
Only odd things are lack or < and <= instructions, specialized load_byte intrinsics
and 'compare and goto' as single instruction.
Current instruction set is fixed, but more instructions can be added in the future.
Signed-off-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Subscribers: majnemer, chandlerc, echristo, joerg, pete, rengolin, kristof.beyls, arsenm, t.p.northover, tstellarAMD, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6494
llvm-svn: 227008
Summary:
At the moment, address calculation is taking the debug line info from the
address node (e.g. TargetGlobalAddress). When a function is called multiple
times, this results in output of the form:
.loc $first_call_location
.. address calculation ..
.. function call ..
.. address calculation ..
.loc $second_call_location
.. function call ..
.loc $first_call_location
.. address calculation ..
.loc $third_call_location
.. function call ..
This patch makes address calculations for function calls take the debug line
info for the call node and results in output of the form:
.loc $first_call_location
.. address calculation ..
.. function call ..
.loc $second_call_location
.. address calculation ..
.. function call ..
.loc $third_call_location
.. address calculation ..
.. function call ..
All other address calculations continue to use the address node.
Test Plan: Fixes test/DebugInfo/multiline.ll on a mips host.
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D7050
llvm-svn: 227005
Summary:
In addition to the included tests, this fixes
test/CodeGen/Generic/i128-addsub.ll on a mips64 host.
Reviewers: atanasyan, sagar, vmedic
Reviewed By: vmedic
Subscribers: sdkie, llvm-commits
Differential Revision: http://reviews.llvm.org/D6610
llvm-svn: 227003
This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.
Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.
llvm-svn: 227002
This just lifts the logic into a static helper function, sinks the
legacy pass to be a trivial wrapper of that helper fuction, and adds
a trivial wrapper for the new PM as well. Not much to see here.
I switched a test case to run in both modes, but we have to strip the
dead prototypes separately as that pass isn't in the new pass manager
(yet).
llvm-svn: 226999
changed the IR. This is particularly easy as we can just look for the
existence of any expect intrinsic at all to know whether we've changed
the IR.
llvm-svn: 226998
for small switches, and avoid using a complex loop to set up the
weights.
We know what the baseline weights will be so we can just resize the
vector to contain all that value and clobber the one slot that is
likely. This seems much more direct than the previous code that tested
at every iteration, and started off by zeroing the vector.
llvm-svn: 226995
It was already in the Scalar header and referenced extensively as being
in this library, the source file was just in the utils directory for
some reason. No actual functionality changed. I noticed as it didn't
make sense to add a pass header to the utils headers.
llvm-svn: 226991
This is exciting as this is a much more involved port. This is
a complex, existing transformation pass. All of the core logic is shared
between both old and new pass managers. Only the access to the analyses
is separate because the actual techniques are separate. This also uses
a bunch of different and interesting analyses and is the first time
where we need to use an analysis across an IR layer.
This also paves the way to expose instcombine utility functions. I've
got a static function that implements the core pass logic over
a function which might be mildly interesting, but more interesting is
likely exposing a routine which just uses instructions *already in* the
worklist and combines until empty.
I've switched one of my favorite instcombine tests to run with both as
well to make sure this keeps working.
llvm-svn: 226987
Eventually we can make some of these pass the error along to the caller.
Reports a fatal error if:
We find an invalid abbrev record
We try to get an invalid abbrev number
We can't fill the current word due to an EOF
Fixed an invalid bitcode test to check for output with FileCheck
Bugs found with afl-fuzz
llvm-svn: 226986
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
llvm-svn: 226981
This patch adds the missing LD[U]RSW variants to the load store optimizer, so
that we generate LDPSW when possible.
<rdar://problem/19583480>
llvm-svn: 226978
Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of
using stack store/load pair to do the job, use scalar_to_vector directly, which
in the MMX case can use movq2dq. This was the current behavior prior to
improvements for vector legalization of extloads in r213897.
This commit fixes the regression and as a side-effect also remove some
unnecessary shuffles.
In the new attached testcase, we go from:
pshufw $-18, (%rdi), %mm0
movq %mm0, -8(%rsp)
movq -8(%rsp), %xmm0
pshufd $-44, %xmm0, %xmm0
movd %xmm0, %eax
...
To:
pshufw $-18, (%rdi), %mm0
movq2dq %mm0, %xmm0
movd %xmm0, %eax
...
Differential Revision: http://reviews.llvm.org/D7126
rdar://problem/19413324
llvm-svn: 226953
We used to do this promotion during DAG legalization, but this
caused an infinite loop in ExpandUnalignedLoad() because it assumed
that i64 loads were legal if i64 was a legal type.
It also seems better to report i64 loads as legal, since they actually
are and we were just promoting them to simplify our tablegen files.
llvm-svn: 226945
This patch adds a new set of JIT APIs to LLVM. The aim of these new APIs is to
cleanly support a wider range of JIT use cases in LLVM, and encourage the
development and contribution of re-usable infrastructure for LLVM JIT use-cases.
These APIs are intended to live alongside the MCJIT APIs, and should not affect
existing clients.
Included in this patch:
1) New headers in include/llvm/ExecutionEngine/Orc that provide a set of
components for building JIT infrastructure.
Implementation code for these headers lives in lib/ExecutionEngine/Orc.
2) A prototype re-implementation of MCJIT (OrcMCJITReplacement) built out of the
new components.
3) Minor changes to RTDyldMemoryManager needed to support the new components.
These changes should not impact existing clients.
4) A new flag for lli, -use-orcmcjit, which will cause lli to use the
OrcMCJITReplacement class as its underlying execution engine, rather than
MCJIT itself.
Tests to follow shortly.
Special thanks to Michael Ilseman, Pete Cooper, David Blaikie, Eric Christopher,
Justin Bogner, and Jim Grosbach for extensive feedback and discussion.
llvm-svn: 226940
SimplifyCFG currently does this transformation, but I'm planning to remove that
to allow other passes, such as this one, to exploit the unreachable default.
This patch takes care to keep track of what case values are unreachable even
after the transformation, allowing for more efficient lowering.
Differential Revision: http://reviews.llvm.org/D6697
llvm-svn: 226934
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.
Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D6987
llvm-svn: 226920
Summary:
We used to silently ignore any empty .module's and we used to give an error saying that we found
an "unexpected token at start of statement" when the value of the option wasn't an identifier (e.g. if it was a number).
We now give an error saying that we "expected .module option identifier" in both of those cases.
I also fixed the other tests in mips-abi-bad.s, which all seemed to be broken.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7095
llvm-svn: 226905
Summary: When trying to constant fold an FMA in the DAG, getNode()
fails to fold the FMA if an operand is not finite. In this case this
patch allows the constant folding if !TLI->hasFloatingPointExceptions()
Reviewers: resistor
Reviewed By: resistor
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D6912
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 226901
v2: add and enable tests for SI
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226881
v2: use getZExtValue
add missing break
codestyle
v3: add few more comments
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226880
optimizations can handle removing the Hi part operations.
The generated code is identical for R600, ~10% icount reduction for SI
v2: rebase
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226879
These things are potentially used for non-DWARF data (see the discussion
in PR22235), so take the `Dwarf` out of the name. Since the new name
gives fewer clues, update the doxygen to properly describe what they
are.
llvm-svn: 226874
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.
Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.
llvm-svn: 226873
I had already factored this analysis specifically to enable doing this,
but hadn't actually committed the necessary wiring to get at this from
the new pass manager. This also nicely shows how the separate cache
object can be directly managed by the new pass manager.
This analysis didn't have any direct tests and so I've added a printer
pass and a boring test case. I chose to print the i1 value which is
being assumed rather than the call to llvm.assume as that seems much
more useful for testing... but suggestions on an even better printing
strategy welcome. My main goal was to make sure things actually work. =]
llvm-svn: 226868
During `MDNode::deleteTemporary()`, call `replaceAllUsesWith(nullptr)`
to update all tracking references to `nullptr`.
This fixes PR22280, where inverted destruction order between tracking
references and the temporaries themselves caused a use-after-free in
`LLParser`.
An alternative fix would be to add an assertion that there are no users,
and continue to fix inverted destruction order in clients (like
`LLParser`), but instead I decided to make getting-teardown-right easy.
(If someone disagrees let me know.)
llvm-svn: 226866
Summary:
Some parsers need references back to the option they are members of. This is used for handling the argument string as well as by the various pass name parsers for making pass names into flags.
Making parsers that need to refer back to the option have a reference to the option eliminates some of the members of various parsers, and enables further code cleanup.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7131
llvm-svn: 226864
Specifically, gc.result benefits from this greatly. Instead of:
gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...
We now have a gc.result.* that can specialize to literally any type.
Differential Revision: http://reviews.llvm.org/D7020
llvm-svn: 226857
This reverts commit r176827.
Björn Steinbrink pointed out that this didn't actually fix the bug
(PR15555) it was attempting to fix.
With this reverted, we can now remove landingpad cleanups that
immediately resume unwinding, converting the invoke to a call.
llvm-svn: 226850
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698.
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.
The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.
This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.
Differential Revision: http://reviews.llvm.org/D6850
llvm-svn: 226845
Currently, we're adding a uint64_t describing the current subtarget so
that matching can check whether the specified register is valid.
However, we want to move to a bitset for those bits (x86 has more than
64 of them).
This can't live in a union so it's probably better to do the checks
early (especially as there are only 3 of them).
llvm-svn: 226841
The ELF format is used on Windows by the MCJIT engine. Thus, on Windows, the
ELFObjectWriter can encounter symbols mangled using the MS Visual Studio C++
name mangling. Symbols mangled using the MSVC C++ name mangling can legally
have "@@@" as a substring. The EFLObjectWriter should not interpret the "@@@"
substring as specifying GNU-style symbol versioning. The ELFObjectWriter
therefore check for the MSVC C++ name mangling prefix which is either "?", "@?",
"imp_?" or "imp_?@".
llvm-svn: 226830
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
Fixed recommit of r226811.
llvm-svn: 226816
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
llvm-svn: 226811
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.
llvm-svn: 226808
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).
Added a test provided by David (dag@cray.com)
llvm-svn: 226805
Use the struct instead of a std::pair<Value *, Value *>. This makes a
Range an obviously immutable object, and we can now assert that a
range is well-typed (Begin->getType() == End->getType()) on its
construction.
llvm-svn: 226804
There are places where the inductive range check elimination pass
depends on two llvm::Values or llvm::SCEVs to be of the same
llvm::Type when they do not need to be. This patch relaxes those
restrictions (by bailing out of the optimization if the types
mismatch), and adds test cases to trigger those paths.
These issues were found by bootstrapping clang with IRCE running in
the -O3 pass ordering.
Differential Revision: http://reviews.llvm.org/D7082
llvm-svn: 226793
Even with the current limit on the number of alias checks, the containing loop has quadratic complexity.
This begins to hurt for blocks containing > 1K load/store instructions.
This commit introduces a limit for the loop count. It reduces the runtime for such very large blocks.
llvm-svn: 226792
creating a non-internal header file for the InstCombine pass.
I thought about calling this InstCombiner.h or in some way more clearly
associating it with the InstCombiner clas that it is primarily defining,
but there are several other utility interfaces defined within this for
InstCombine. If, in the course of refactoring, those end up moving
elsewhere or going away, it might make more sense to make this the
combiner's header alone.
Naturally, this is a bikeshed to a certain degree, so feel free to lobby
for a different shade of paint if this name just doesn't suit you.
llvm-svn: 226783
ever stored to always use a legal integer type if one is available.
Regardless of whether this particular type is good or bad, it ensures we
don't get weird differences in generated code (and resulting
performance) from "equivalent" patterns that happen to end up using
a slightly different type.
After some discussion on llvmdev it seems everyone generally likes this
canonicalization. However, there may be some parts of LLVM that handle
it poorly and need to be fixed. I have at least verified that this
doesn't impede GVN and instcombine's store-to-load forwarding powers in
any obvious cases. Subtle cases are exactly what we need te flush out if
they remain.
Also note that this IR pattern should already be hitting LLVM from Clang
at least because it is exactly the IR which would be produced if you
used memcpy to copy a pointer or floating point between memory instead
of a variable.
llvm-svn: 226781
Windows supports a restricted set of relocations (compared to ARM ELF). In some
cases, we may end up generating an unsupported relocation. This can occur with
bad input to the assembler in particular (the frontend should never generate
code that cannot be compiled). Generate an error rather than just aborting.
The change in the API is driven by the desire to provide a slightly more helpful
message for debugging purposes.
llvm-svn: 226779
ScalarEvolution currently lowers a subtraction recurrence to an add
recurrence with the same no-wrap flags as the subtraction. This is
incorrect because `sub nsw X, Y` is not the same as `add nsw X, -Y`
and `sub nuw X, Y` is not the same as `add nuw X, -Y`. This patch
fixes the issue, and adds two test cases demonstrating the bug.
Differential Revision: http://reviews.llvm.org/D7081
llvm-svn: 226755
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.
The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.
Differential Revision: http://reviews.llvm.org/D7094
llvm-svn: 226745
When two calls from the same MDLocation are inlined they currently get
treated as one inlined function call (creating difficulty debugging,
duplicate variables, etc).
Clang worked around this by including column information on inline calls
which doesn't address LTO inlining or calls to the same function from
the same line and column (such as through a macro). It also didn't
address ctor and member function calls.
By making the inlinedAt locations distinct, every call site has an
explicitly distinct location that cannot be coalesced with any other
call.
This can produce linearly (2x in the worst case where every call is
inlined and the call instruction has a non-call instruction at the same
location) more debug locations. Any increase beyond that are in cases
where the Clang workaround was insufficient and the new scheme is
creating necessary distinct nodes that were being erroneously coalesced
previously.
After this change to LLVM the incomplete workarounds in Clang. That
should reduce the number of debug locations (in a build without column
info, the default on Darwin, not the default on Linux) by not creating
pseudo-distinct locations for every call to an inline function.
(oh, and I made the inlined-at chain rebuilding iterative instead of
recursive because I was having trouble wrapping my head around it the
way it was - open to discussion on the right design for that function
(including going back to a recursive solution))
llvm-svn: 226736
Summary: cl::getRegisteredOptions really exposes some of the innards of how command line parsing is implemented. Exposing new APIs that allow us to disentangle client code from implementation details will allow us to make more extensive changes to command line parsing.
Reviewers: chandlerc, dexonsmith, beanz
Reviewed By: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7100
llvm-svn: 226729
This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds.
Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.
Also adds a missing tablegen pattern for MOVDDUP.
Differential Revision: http://reviews.llvm.org/D7042
llvm-svn: 226716
Thumbv4t does not have lo->lo copies other than MOVS,
and that can't be predicated. So emit MOVS when needed
and bail if there's a predicate.
http://reviews.llvm.org/D6592
llvm-svn: 226711
This cleans up code and is more in line with the general philosophy of
modifying LiveIntervals through LiveIntervalAnalysis instead of changing
them directly.
This also fixes a case where SplitEditor::removeBackCopies() would miss
the subregister ranges.
llvm-svn: 226690
This cleans up code and is more in line with the general philosophy of
modifying LiveIntervals through LiveIntervalAnalysis instead of changing
them directly.
llvm-svn: 226687
This fixes it for SI. It also removes the pattern
used previously for Evergreen for f32. I'm not sure
if the the new R600 output is better or not, but it uses
1 fewer instructions if BFI is available.
llvm-svn: 226682
Now that we can fully specify extload legality, we can declare them
legal for the PMOVSX/PMOVZX instructions. This for instance enables
a DAGCombine to fire on code such as
(and (<zextload-equivalent> ...), <redundant mask>)
to turn it into:
(zextload ...)
as seen in the testcase changes.
There is one regression, in widen_load-2.ll: we're no longer able
to do store-to-load forwarding with illegal extload memory types.
This will be addressed separately.
Differential Revision: http://reviews.llvm.org/D6533
llvm-svn: 226676
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.
From: Andrew Turner <andrew@fubar.geek.nz>
llvm-svn: 226664
Previously we always stored 4 bytes of origin at the destination address
even for 8-byte (and longer) stores.
This should fix rare missing, or incorrect, origin stacks in MSan reports.
llvm-svn: 226658
Implement microMIPS 16-bit unconditional branch instruction B.
Implemented 16-bit microMIPS unconditional instruction has real name B16, and
B is an alias which expands to either B16 or BEQ according to the rules:
b 256 --> b16 256 # R_MICROMIPS_PC10_S1
b 12256 --> beq $zero, $zero, 12256 # R_MICROMIPS_PC16_S1
b label --> beq $zero, $zero, label # R_MICROMIPS_PC16_S1
Differential Revision: http://reviews.llvm.org/D3514
llvm-svn: 226657
Because in its primary function pass the combiner is run repeatedly over
the same function until doing so produces no changes, it is essentially
to not re-allocate the worklist. However, as a utility, the more common
pattern would be to put a limited set of instructions in the worklist
rather than the entire function body. That is also the more likely
pattern when used by the new pass manager.
The result is a very light weight combiner that does the visiting with
a separable worklist. This can then be wrapped up in a helper function
for users that want a combiner utility, or as I have here it can be
wrapped up in a pass which manages the iterations used when combining an
entire function's instructions.
Hopefully this removes some of the worst of the interface warts that
became apparant with the last patch here. However, there is clearly more
work. I've again left some FIXMEs for the most egregious. The ones that
stick out to me are the exposure of the worklist and IR builder as
public members, and the use of pointers rather than references. However,
fixing these is likely to be much more mechanical and less interesting
so I didn't want to touch them in this patch.
llvm-svn: 226655
SimplifyLibCalls utility by sinking it into the specific call part of
the combiner.
This will avoid us needing to do any contortions to build this object in
a subsequent refactoring I'm doing and seems generally better factored.
We don't need this utility everywhere and it carries no interesting
state so we might as well build it on demand.
llvm-svn: 226654
a more direct approach: a type-erased glorified function pointer. Now we
can pass a function pointer into this for the easy case and we can even
pass a lambda into it in the interesting case in the instruction
combiner.
I'll be using this shortly to simplify the interfaces to InstCombiner,
but this helps pave the way and seems like a better design for the
libcall simplifier utility.
llvm-svn: 226640
This creates a small internal pass which runs the InstCombiner over
a function. This is the hard part of porting InstCombine to the new pass
manager, as at this point none of the code in InstCombine has access to
a Pass object any longer.
The resulting interface for the InstCombiner is pretty terrible. I'm not
planning on leaving it that way. The key thing missing is that we need
to separate the worklist from the combiner a touch more. Once that's
done, it should be possible for *any* part of LLVM to just create
a worklist with instructions, populate it, and then combine it until
empty. The pass will just be the (obvious and important) special case of
doing that for an entire function body.
For now, this is the first increment of factoring to make all of this
work.
llvm-svn: 226618
don't get muddied up by formatting changes.
Some of these don't really seem like improvements to me, but they also
don't seem any worse and I care much more about not formatting them
manually than I do about the particular formatting. =]
llvm-svn: 226610
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.
This is not a complete solution but works around the most pressing
issue.
Review: http://reviews.llvm.org/D7070
llvm-svn: 226600
With the appropriate Verifier changes, exactracting the result out of a
statepoint wrapping a vararg function crashes. However, a void vararg
function works fine: commit this first step.
Differential Revision: http://reviews.llvm.org/D7071
llvm-svn: 226599
This reapplies r225379.
ChangeLog:
- The assertion that this commit previously ran into about the inability
to handle indirect variables has since been removed and the backend
can handle this now.
- Testcases were upgrade to the new MDLocation format.
- Instead of keeping a DebugDeclares map, we now use
llvm::FindAllocaDbgDeclare().
Original commit message follows.
Debug info: Teach SROA how to update debug info for fragmented variables.
This allows us to generate debug info for extremely advanced code such as
typedef struct { long int a; int b;} S;
int foo(S s) {
return s.b;
}
which at -O1 on x86_64 is codegen'd into
define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
ret i32 %s.coerce1, !dbg !24
}
with this patch we emit the following debug info for this
TAG_formal_parameter [3]
AT_location( 0x00000000
0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
AT_name( "s" )
AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )
Thanks to chandlerc, dblaikie, and echristo for their feedback on all
previous iterations of this patch!
llvm-svn: 226598
ConstantArrays constructed during linking can cause quadratic memory
explosion. An example is the ConstantArrays constructed when linking in
GlobalVariables with appending linkage.
Releasing all unused constants can cause a 20% LTO compile-time
slowdown for a large application. So this commit releases unused ConstantArrays
only.
rdar://19040716. It reduces memory footprint from 20+G to 6+G.
llvm-svn: 226592
We were passing the scratch buffer address to the shaders via user sgprs,
but now we use external symbols and have the driver patch the shader
using reloc information.
llvm-svn: 226586
We don't have a good way of legalizing this if the frame index offset
is more than the 12-bits, which is size of MUBUF's offset field, so
now we store the frame index in the vaddr field.
llvm-svn: 226584
Implement microMIPS 16-bit unconditional branch instruction B.
Implemented 16-bit microMIPS unconditional instruction has real name B16, and
B is an alias which expands to either B16 or BEQ according to the rules:
b 256 --> b16 256 # R_MICROMIPS_PC10_S1
b 12256 --> beq $zero, $zero, 12256 # R_MICROMIPS_PC16_S1
b label --> beq $zero, $zero, label # R_MICROMIPS_PC16_S1
Differential Revision: http://reviews.llvm.org/D3514
llvm-svn: 226577
This commits adds the octeon branch instructions bbit0/bbit032/bbit1/bbit132.
It also includes patterns for instruction selection and test cases.
Reviewed by D. Sanders
llvm-svn: 226573
The new code does not create new basic blocks in the case when shadow is a
compile-time constant; it generates either an unconditional __msan_warning
call or nothing instead.
llvm-svn: 226569
pass and a LoopPrinterPass with the expected associated wiring.
I've added a RUN line to the only test case (!!!) we have that actually
prints loops. Everything seems to be working.
This is somewhat exciting as this is the first analysis using another
analysis to go in for the new pass manager. =D I also believe it is the
last analysis necessary for porting instcombine, but of course I may yet
discover more.
llvm-svn: 226560
This is in preparation for a fix to llvm.org/PR22262. One of the ideas
here is to first find a good jump table range first and then split
before and after it. Thereby, we don't need to use the
split-based-on-density heuristic at all, which can make the "binary
tree" deteriorate in various cases.
Also some minor cleanups.
No functional changes.
llvm-svn: 226551
along with the other analyses.
The most obvious reason why is because eventually I need to separate out
the pass layer from the rest of the instcombiner. However, it is also
probably a compile time win as every query through the pass manager
layer is pretty slow these days.
llvm-svn: 226550
This patch fixes 2 issues in reorderInputsAccordingToOpcode
1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases.
2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode.
Thanks Michael for inputs and review.
Review: http://reviews.llvm.org/D6677
llvm-svn: 226547
This reverts commit r226542, effectively reapplying r226540. This time,
initialize `IsEmpty` in the copy and move constructors as well.
llvm-svn: 226545
Now that the clone methods used by `MapMetadata()` don't do any
remapping (and return a temporary), they make more sense as member
functions on `MDNode` (and subclasses).
llvm-svn: 226541
Change `HeaderBuilder` API to work well even when it's not starting with
a tag. There's already one case like this, and the tag is moving
elsewhere as part of PR22235.
llvm-svn: 226540
a DominatorTree argument as that is the analysis that it wants to
update.
This removes the last non-loop utility function in Utils/ which accepts
a raw Pass argument.
llvm-svn: 226537
Rather than relying on updating switch statements correctly, detect
whether `setHash()` exists in the subclass. If so, call
`recalculateHash()` and `setHash(0)` appropriately.
llvm-svn: 226531
As part of PR22235, introduce `DwarfNode` and `GenericDwarfNode`. The
former is a metadata node with a DWARF tag. The latter matches our
current (generic) schema of a header with string (and stringified
integer) data and an arbitrary number of operands.
This doesn't move it into place yet; that change will require a large
number of testcase updates.
llvm-svn: 226529
Swap usage of `SubclassData32` and `MDNodeSubclassData`, and rename
`MDNodeSubclassData` to `NumUnresolved`. Small drive-by cleanup to
`countUnresolvedOperands()` since otherwise the name clash with local
vars named `NumUnresolved` would be confusing.
llvm-svn: 226523
As pointed out in r226501, the distinction between `MDNode` and
`UniquableMDNode` is confusing. When we need subclasses of `MDNode`
that don't use all its functionality it might make sense to break it
apart again, but until then this makes the code clearer.
llvm-svn: 226520