When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to
mark the vreg containing 1000 as killed. Given that we go bottom up in fast-isel, a later
use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill.
However, rematerialised constant expressions aren't generated bottom up. The local value save area
grows downwards. This means that if you remat 2 constant expressions which both use 1000 then the
first will kill it, then the second, which is *lower* in the BB will read a killed register.
This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset.
Note that this commit only makes kill flag generation conservative. There's nothing else obviously wrong with
the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions.
However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely.
llvm-svn: 236922
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Remove f suffix from constant in double implementations.
- Consolidate implementations using the .cl/.inc approach.
v3:
- Use __CLC_FPSIZE instead of __CLC_FP{32,64}
v4 (Jan Vesely):
- Limit to single precision.
llvm-svn: 236920
Author: dblaikie
Date: Fri May 8 17:47:50 2015
New Revision: 236912
URL: http://llvm.org/viewvc/llvm-project?rev=236912&view=rev
Log:
[opaque pointer type] Cleanup a few references to pointee types using nearby non-pointee types of the same value
& cleanup a convoluted return expression while I'm here
llvm-svn: 236919
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
llvm-svn: 236916
This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.
v2:
- Alphabettize SOURCES
v3 (Jan Vesely):
Limit to single precision types.
llvm-svn: 236915
Summary:
This seems to be sufficient to get the tests taking longer than the
previous timeout of 5m to run to completion on Android to pass instead
of timing out.
Reviewers: chaoren
Reviewed By: chaoren
Subscribers: tberghammer, lldb-commits
Differential Revision: http://reviews.llvm.org/D9627
llvm-svn: 236913
This reverts commit 236854, which caused clang-format to always print
'{ "IncompleteFormat": false }' at the top of an incompletely formatted file.
This output causes problems e.g. in Polly's automatic formatting checks. Daniel
tried to fix this in 236867, but this fix had to be reverted due to buildbot
failures. I revert this change as well for now as it is Friday night and
unlikely to be fixed immediately.
llvm-svn: 236908
This is a starting point for using the TargetParser in Clang, in a simple
enough part of the code that can be used without disrupting the crazy
platform support that we need to be compatible with other toolchains.
Also adding a few FIXME on obvious places that need replacing, but those
cases will indeed break a few of the platform assumptions, as arch/cpu names
change multiple times in the driver.
Finally, I'm changing the "neon-vfpv3" behaviour to match standard NEON, since
-mfpu=neon implies vfpv3 by default in both Clang and LLVM. That option
string is still supported as an alias to "neon".
llvm-svn: 236901
This new class in a global context contain arch-specific knowledge in order
to provide LLVM libraries, tools and projects with the ability to understand
the architectures. For now, only FPU, ARCH and ARCH extensions on ARM are
supported.
Current behaviour it to parse from free-text to enum values and back, so that
all users can share the same parser and codes. This simplifies a lot both the
ASM/Obj streamers in the back-end (where this came from), and the front-end
parsers for command line arguments (where this is going to be used next).
The previous implementation, using .def/.h includes is deprecated due to its
inflexibility to be built without the backend support and for being too
cumbersome. As more architectures join this scheme, and as more features of
such architectures are added (such as hardware features, type sizes, etc) into
a full blown TargetDescription class, having a set of classes is the most
sane implementation.
The ultimate goal of this refactor both LLVM's and Clang's target description
classes into one unique interface, so that we can de-duplicate and standardise
the descriptions, as well as make it available for other front-ends, tools,
etc.
The FPU parsing for command line options in Clang has been converted to use
this new library and a number of aliases were added for compatibility:
* A bogus neon-vfpv3 alias (neon defaults to vfp3)
* armv5/v6
* {fp4/fp5}-{sp/dp}-d16
Next steps:
* Port Clang's ARCH/EXT parsing to use this library.
* Create a TableGen back-end to generate this information.
* Run this TableGen process regardless of which back-ends are built.
* Expose more information and rename it to TargetDescription.
* Continue re-factoring Clang to use as much of it as possible.
llvm-svn: 236900
Patch from Geoff Berry <gberry@codeaurora.org>
Fix BackendConsumer::EmitOptimizationMessage() to check if the
DiagnosticInfoOptimizationBase object has a valid location before
calling getLocation() to avoid dereferencing a null pointer inside
getLocation() when no debug info is present.
llvm-svn: 236898
When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register. This is done via updateValueMap,.
However, its possible that the source register we are rewriting *to* to also have uses. If those uses are after a kill of the value we are rewriting *from* then we have uses after a kill and the verifier fails.
This code checks for the case where the to register is also used, and if so it clears all kill on the from register. This is conservative, but better that always clearing kills on the from register.
llvm-svn: 236897
Refactored parts of the hardware loop pass to generate
more. Also, added more tests.
Differential Revision: http://reviews.llvm.org/D9568
llvm-svn: 236896
Summary:
There are several unhandled edge cases in BasicAA's GetLinearExpression
method. This changes fixes outstanding issues, including zext / sext of
a constant with the sign bit set, and the refusal to decompose zexts or
sexts of wrapping arithmetic.
Test Plan: Unit tests added in //q.ext.ll//.
Patch by Nick White.
Reviewers: hfinkel, sanjoy
Reviewed By: hfinkel, sanjoy
Subscribers: sanjoy, llvm-commits, hfinkel
Differential Revision: http://reviews.llvm.org/D6682
llvm-svn: 236894
Thread-safe logging had been disabled because of a deadlock,
possibly due to a lock acquired during a signal handler.
This patch turns thread safe logging back on and also greatly
reduces the scope of the lock, confining it only to the code that
affects the underlying output stream, instead of all the code that
builds up the formatted log message. this should resolve the
issue surrounding the deadlock.
llvm-svn: 236892
A trunc from i32 to i1 on x86_64 generates an instruction such as
%vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9
However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one.
Otherwise, we are killing the source of the truncate which could be used later in the program.
llvm-svn: 236890
This changes the shape of the statepoint intrinsic from:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)
to:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)
This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.
In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.
Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.
Differential Revision: http://reviews.llvm.org/D9501
llvm-svn: 236888
Summary:
I noticed this bug when deubging a WIP on LSR. I wonder whether and how we
should add a regression test for this.
Test Plan: no tests failed.
Reviewers: atrick
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D9536
llvm-svn: 236887
The test here was sinking the AND here to a lower BB:
%vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8
TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8
which meant that vreg8 was read after it was killed.
This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND.
llvm-svn: 236886
Functions with available_externally linkage will not be emitted to object
files (they will just be undefined symbols), so it does not make sense to
put them in comdats.
Creates a second overload of maybeSetTrivialComdat that uses the GlobalObject
instead of the Decl, and uses that in several places that had the faulty
logic.
Differential Revision: http://reviews.llvm.org/D9580
llvm-svn: 236879
Improved the AnalyzeBranch, InsertBranch, and RemoveBranch
functions in order to handle more of our branch instructions.
This requires changes to analyzeCompare and PredicateInstructions.
Specifically, we've added support for new value compare jumps,
improved handling of endloop, added more compare instructions,
and improved support for predicate instructions.
Differential Revision: http://reviews.llvm.org/D9559
llvm-svn: 236876
This patch provides generation of .ARM.exidx & .ARM.extab sections which are
used for unwinding. The patch adds new content type typeARMExidx for atoms from
.ARM.exidx section and integration of atoms with such type to the ELF
ReaderWriter. exidx.test has been added with checking of contents of .ARM.exidx
section and .ARM.extab section.
Differential Revision: http://reviews.llvm.org/D9324
llvm-svn: 236873
For PIE executables the type of the main executable is shared object
so we can't force that the main executable should have the type of
executable.
llvm-svn: 236870