bitwidth op back to the original size. If we reduce ANDs then this can cause
an endless loop. This patch changes the ZEXT to ANY_EXTEND if the demanded bits
are equal or smaller than the size of the reduced operation.
llvm-svn: 170505
instructions in the assembly code variant if one exists.
The intended use for this is so tools like lldb and darwin's otool(1)
can be switched to print Intel-flavored disassembly.
I discussed extensively this API with Jim Grosbach and we feel
while it may not be fully general, in reality there is only one syntax
for each assembly with the exception of X86 which has exactly
two for historical reasons.
rdar://10989182
llvm-svn: 170477
The bundle flags are now maintained by the slightly higher-level
functions bundleWithPred() / bundleWithSucc() which enforce consistent
bundle flags between neighboring instructions.
See also MIBundleBuilder for an even higher-level approach to building
bundles.
llvm-svn: 170475
The bundle_iterator::operator++ function now doesn't need to dig out the
basic block and check against end(). It can use the isBundledWithSucc()
flag to find the last bundled instruction safely.
Similarly, MachineInstr::isBundled() no longer needs to look at
iterators etc. It only has to look at flags.
llvm-svn: 170473
To not over constrain the scheduler for ARM in thumb mode, some optimizations for code size reduction, specific to ARM thumb, are blocked when they add a dependency (like write after read dependency).
Disables this check when code size is the priority, i.e., code is compiled with -Oz.
llvm-svn: 170462
The bundle-related MI flags need to be kept in sync with the neighboring
instructions. Don't allow the bulk flag-setting setFlags() function to
change them.
Also don't copy MI flags when cloning an instruction. The clone's bundle
flags will be set when it is explicitly inserted into a bundle.
llvm-svn: 170459
Remove the instr_iterator versions of the splice() functions. It doesn't
seem useful to be able to splice sequences of instructions that don't
consist of full bundles.
The normal splice functions that take MBB::iterator arguments are not
changed, and they can move whole bundles around without any problems.
llvm-svn: 170456
The single-element ilist::splice() function supports a noop move:
List.splice(I, List, I);
The corresponding std::list function doesn't allow that, so add a unit
test to document that behavior.
This also means that
List.splice(I, List, F);
is somewhat surprisingly not equivalent to
List.splice(I, List, F, next(F));
This patch adds an assertion to catch the illegal case I == F above.
Alternatively, we could make I == F a legal noop, but that would make
ilist differ even more from std::list.
llvm-svn: 170443
The normal insert() function takes an MBB::iterator position, and
inserts a stand-alone MachineInstr as before.
The insert() function that takes an MBB::instr_iterator position can
insert instructions inside a bundle, and will now update the bundle
flags correctly when that happens.
When the insert position is between two bundles, it is unclear whether
the instruction should be appended to the previous bundle, prepended to
the next bundle, or stand on its own. The MBB::insert() function doesn't
bundle the instruction in that case, use the MIBundleBuilder class for
that.
llvm-svn: 170437
A register can be associated with several distinct register classes.
For example, on PPC, the floating point registers are each associated with
both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code
would fail when provided with a floating point register and an f64 operand
because it would happen to find the register in the F4RC class first and
return that. From the F4RC class, SDAG would extract f32 as the register
type and then assert because of the invalid implied conversion between
the f64 value and the f32 register.
Instead, search all register classes. If a register class containing the
the requested register has the requested type, then return that register
class. Otherwise, as before, return the first register class found that
contains the requested register.
llvm-svn: 170436
instruction.
This isn't strictly necessary at the moment because Thumb2SizeReduction
also copies all MI flags from the old instruction to the new. However, a
future patch will make that kind of direct flag tampering illegal.
llvm-svn: 170395
Most code is oblivious to bundles and uses the MBB::iterator which only
visits whole bundles. MBB::erase() operates on whole bundles at a time
as before.
MBB::remove() now refuses to remove bundled instructions. It is not safe
to remove all instructions in a bundle without deleting them since there
is no way of returning pointers to all the removed instructions.
MBB::remove_instr() and MBB::erase_instr() will now update bundle flags
correctly, lifting individual instructions out of bundles while leaving
the remaining bundle intact.
The MachineInstr convenience functions are updated so
eraseFromParent() erases a whole bundle as before
eraseFromBundle() erases a single instruction, leaving the rest of its bundle.
removeFromParent() refuses to operate on bundled instructions, and
removeFromBundle() lifts a single instruction out of its bundle.
These functions will no longer accidentally split or coalesce bundles -
bundle flags are updated to preserve the existing bundling, and explicit
bundleWith* / unbundleFrom* functions should be used to change the
instruction bundling.
This API update is still a work in progress. I am going to update APIs
first so they maintain bundle flags automatically when possible. Then
I'll add stricter verification of the bundle flags.
llvm-svn: 170384
compilation directory.
This defaults to the current working directory, just as it always has,
but now an assembler can choose to override it with a custom directory.
I've taught llvm-mc about this option and added a test case.
llvm-svn: 170371
This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.
llvm-svn: 170353
They seem to work fine.
Patch by: Christian König
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 170343
Patch by: Christian König
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 170342
The Align parameter is a power of two, so 16 results in 64K
alignment. Additional to that even 16 byte alignment doesn't
make any sense, so just remove it.
Patch by: Christian König
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 170341
This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.
Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.
The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.
llvm-svn: 170338
TargetLowering::getRegClassFor).
Some isSimple() guards were missing, or getSimpleVT() were hoisted too
far, resulting in asserts on valid LLVM assembly input.
llvm-svn: 170336
Check whether a BB is known as reachable before adding it to the worklist.
This way BB's with multiple predecessors are added to the list no more than
once.
llvm-svn: 170335
The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.
This should fix two new asserts that showed up in the vectorize nightly
tester.
llvm-svn: 170333
I noticed this while looking at r170328. We only ever do a vector
rewrite when the alloca *is* the vector type, so it's good to not paper
over bugs here by doing a convertValue that isn't needed.
llvm-svn: 170331
This will allow its use inside of memcpy rewriting as well. This routine
is more complex than extractVector, and some of its uses are not 100%
where I want them to be so there is still some work to do here.
While this can technically change the output in some cases, it shouldn't
be a change that matters -- IE, it can leave some dead code lying around
that prior versions did not, etc.
Yet another step in the refactorings leading up to the solution to the
last component of PR14478.
llvm-svn: 170328
The method helpers all implicitly act upon the alloca, and what we
really want is a fully generic helper. Doing memcpy rewrites is more
special than all other rewrites because we are at times rewriting
instructions which touch pointers *other* than the alloca. As
a consequence all of the helpers needed by memcpy rewriting of
sub-vector copies will need to be generalized fully.
Note that all of these helpers ({insert,extract}{Integer,Vector}) are
woefully uncommented. I'm going to go back through and document them
once I get the factoring correct.
No functionality changed.
llvm-svn: 170325
PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).
The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.
Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.
llvm-svn: 170301
No functionality changed. Refactoring leading up to the fix for PR14478
which requires some significant changes to the memset and memcpy
rewriting.
llvm-svn: 170299
Currently there is no instruction encoding info and
XCoreDisassembler::getInstruction() always returns Fail. I intend to add
instruction encodings and tests in follow on commits.
llvm-svn: 170292
Mips16 is really a processor decoding mode (ala thumb 1) and in the same
program, mips16 and mips32 functions can exist and can call each other.
If a jal type instruction encounters an address with the lower bit set, then
the processor switches to mips16 mode (if it is not already in it). If the
lower bit is not set, then it switches to mips32 mode.
The linker knows which functions are mips16 and which are mips32.
When relocation is performed on code labels, this lower order bit is
set if the code label is a mips16 code label.
In general this works just fine, however when creating exception handling
tables and dwarf, there are cases where you don't want this lower order
bit added in.
This has been traditionally distinguished in gas assembly source by using a
different syntax for the label.
lab1: ; this will cause the lower order bit to be added
lab2=. ; this will not cause the lower order bit to be added
In some cases, it does not matter because in dwarf and debug tables
the difference of two labels is used and in that case the lower order
bits subtract each other out.
To fix this, I have added to mcstreamer the notion of a debuglabel.
The default is for label and debug label to be the same. So calling
EmitLabel and EmitDebugLabel produce the same result.
For various reasons, there is only one set of labels that needs to be
modified for the mips exceptions to work. These are the "$eh_func_beginXXX"
labels.
Mips overrides the debug label suffix from ":" to "=." .
This initial patch fixes exceptions. More changes most likely
will be needed to DwarfCFException to make all of this work
for actual debugging. These changes will be to emit debug labels in some
places where a simple label is emitted now.
Some historical discussion on this from gcc can be found at:
http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00623.htmlhttp://gcc.gnu.org/ml/gcc-patches/2008-11/msg01273.html
llvm-svn: 170279
We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases
if y is a constant. DAGCombiner canonicalizes those so we first have to undo the
canonicalization for those cases. The pattern occurs in gzip when the loop
vectorizer is enabled. Part of PR14613.
llvm-svn: 170273
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.
Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs. Reviewed by Dan Gohman.
llvm-svn: 170269
In this case, essentially it is soft float with different library routines.
The next step will be to make this fully interoperational with mips32 floating
point and that requires creating stubs for functions with signatures that
contain floating point types.
I have a more sophisticated design for mips16 hardfloat which I hope to
implement at a later time that directly does floating point without the need
for function calls.
The mips16 encoding has no floating point instructions so one needs to
switch to mips32 mode to execute floating point instructions.
llvm-svn: 170259
immediate generates the narrow version. Needed when doing round-trip
assemble/disassemble testing using the alternate syntax that specifies
'pc' directly.
llvm-svn: 170255
for TLS dynamic models on 64-bit PowerPC ELF. The default sort routine
for relocations only sorts on the r_offset field; but with TLS, there
can be two relocations with the same r_offset. For PowerPC, this patch
sorts secondarily on descending r_type, which matches the behavior
expected by the linker.
llvm-svn: 170237
for a wider range of GOT entries that can hold thread-relative offsets.
This matches the behavior of GCC, which was not documented in the PPC64 TLS
ABI. The ABI will be updated with the new code sequence.
Former sequence:
ld 9,x@got@tprel(2)
add 9,9,x@tls
New sequence:
addis 9,2,x@got@tprel@ha
ld 9,x@got@tprel@l(9)
add 9,9,x@tls
Note that a linker optimization exists to transform the new sequence into
the shorter sequence when appropriate, by replacing the addis with a nop
and modifying the base register and relocation type of the ld.
llvm-svn: 170209
This change moves the code for default shadow propagaition (handleShadowOr)
and origin tracking (setOriginForNaryOp) into a new builder-like class. Also
gets rid of handleShadowOrBinary.
llvm-svn: 170192
The new API is higher level than just manipulating the bundle flags
directly, and the setIsInsideBundle() function will disappear soon.
llvm-svn: 170159
some hackery in place that hid my poor use of TblGen, which I've now sorted
out and cleaned up. No change in observable behavior, so no new test cases.
llvm-svn: 170149