We should be keeping track of the writeback on these instructions,
otherwise we're relying on LLVM's stupidity for correct code.
Fortunately, the MC layer can now handle all required constraints,
which means we can get rid of the CodeGen only PseudoInsts too.
llvm-svn: 209426
This changes ARM64 to use separate operands for each component of an
address, and look for separate '[', '$Rn, ..., ']' tokens when
parsing.
This allows us to do away with quite a bit of special C++ code to
handle monolithic "addressing modes" in the MC components. The more
incremental matching of the assembler operands also allows for better
diagnostics when LLVM is presented with invalid input.
Most of the complexity here is with the register-offset instructions,
which were extremely dodgy beforehand: even when the instruction used
wM, LLVM's model had xM as an operand. We papered over this
discrepancy before, but that approach doesn't work now so I split them
into separate X and W variants.
llvm-svn: 209425
Summary: Depends on D3787. Tablegen will raise an assertion without it.
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3842
llvm-svn: 209419
Summary:
These emit the 'unknown instruction' instead of the correct error
because they have not been implemented in LLVM for any MIPS ISA.
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3841
llvm-svn: 209418
Summary:
This required me to implement the disassembler for MIPS64r6 since the encodings
are ambiguous with other instructions. This in turn revealed a few
assembly/disassembly bugs which I have fixed.
* da[ht]i only take two operands according to the spec, not three.
* DecodeBranchTarget2[16] correctly handles wider immediates than simm16
* Also made non-functional change to DecodeBranchTarget and
DecodeBranchTargetMM to keep implementation style consistent between
them.
* Difficult encodings are handled by a custom decode method on the most
general encoding in the group. This method will convert the MCInst to a
different opcode if necessary.
DecodeBranchTarget is not currently the inverse of getBranchTargetOpValue
so disassembling some branch instructions emit incorrect output. This seems
to affect branches with delay slots on all MIPS ISA's. I've left this bug
for now and temporarily removed the check for the immediate on
bc[12]eqz/bc[12]nez in the MIPS32r6/MIPS64r6 tests.
jialc and jic crash the disassembler for some reason. I've left these
instructions commented out for the moment.
Depends on D3760
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3761
llvm-svn: 209415
Should be no change in behaviour, but it makes the intended
functionality a bit clearer and means we only have to reason about
real extend operations.
llvm-svn: 209409
This intrinsic permits the emission of platform specific undefined sequences.
ARM has reserved the 0xde opcode which takes a single integer parameter (ignored
by the CPU). This permits the operating system to implement custom behaviour on
this trap. The llvm.arm.undefined intrinsic is meant to provide a means for
generating the target specific behaviour from the frontend. This is
particularly useful for Windows on ARM which has made use of a series of these
special opcodes.
llvm-svn: 209390
Now that clang can be used as an assembler via the IAS, invalid assembler inputs
would cause the assertions to trigger. Although we cannot recover from the
errors here, nor provide caret diagnostics, attempt to handle them slightly more
gracefully by reporting a fatal error.
llvm-svn: 209387
constructSubprogramDIE was already called for every subprogram in every
CU when the module was started - there's no need to call it again at
module finalization.
llvm-svn: 209372
This has to do with the trip count computation for loops with multiple
exits, which is quite subtle. Most passes just ask for a single trip
count number, so we must be conservative assuming any exit could be
taken. Normally, we rely on the "exact" trip count, which was
correctly given as "unknown". However, SCEV also gives a "max"
back-edge taken count. The loops max BE taken count is conservatively
a maximum over the max of each exit's non-exiting iterations
count. Note that some exit tests can be skipped so the max loop
back-edge taken count can actually exceed the max non-exiting
iterations for some exits. However, when we know the loop *latch*
cannot be skipped, we can directly use its max taken count
disregarding other exits. I previously took the minimum here without
checking whether the other exit could be skipped. The correct, and
simpler thing to do here is just to directly use the loop latch's max
non-exiting iterations as the loops max back-edge count.
In the problematic test case, the first loop exit had a max of zero
non-exiting iterations, but could be skipped. The loop latch was known
not to be skipped but had max of one non-exiting iteration. We
incorrectly claimed the loop back-edge could be taken zero times, when
it is actually taken one time.
Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count.
Loop %for.body.i: max backedge-taken count is 1.
llvm-svn: 209358
This reverts commit r208930, r208933, and r208975.
It seems not all fission consumers are ready to handle this behavior.
Reverting until tools are brought up to spec.
llvm-svn: 209338
This corrects the emission of IMAGE_REL_ARM_MOV32T relocations. Previously, we
were avoiding the high portion of the relocation too early. If there was a
section-relative relocation with an offset greater than 16-bits (65535), you
would end up truncating the high order bits of the offset. Allow the current
relocation representation to flow through out the MC layer to the object writer.
Use the new ability to restrict recorded relocations to avoid emitting the
relocation into the final object.
llvm-svn: 209337
Add support to allow a target specific COFF object writer to restrict the
recorded resolutions in the emitted object files. This is motivated by the need
in Windows on ARM, where an intermediate relocation needs to be prevented from
being emitted in the object file.
llvm-svn: 209336
Committed in r209178 then reverted in r209251 due to LTO breakage,
here's a proper fix for the case of the missing subprogram DIE. The DIEs
were there, just in other compile units. Using the SPMap we can find the
right compile unit to search for and produce cross-unit references to
describe this kind of inlining.
One existing test case needed to be updated because it had a function
that wasn't in the CU's subprogram list, so it didn't appear in the
SPMap.
llvm-svn: 209335
ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the
second argument.
On the other hand, BLENDI uses 0 to identify the first argument and 1 to
identify the second argument.
Fix the generation of the blend mask to account for this difference.
The bug did not show up with r209043, because we were not checking for the
actual arguments of the blend instruction!
This commit also fixes the test cases.
Note: The same mask works for the BLENDr variant because the arguments are
swapped during instruction selection (see the BLENDXXrr patterns).
<rdar://problem/16975435>
llvm-svn: 209324
Permit active macro expansions when terminating the assembler if there were
errors during the expansion. This would only trigger on invalid input when
built with assertions.
llvm-svn: 209309
The .drectve section should be marked as IMAGE_SCN_LNK_REMOVE. This matches what
the MSVC toolchain does and accurately reflects that this section should not be
emitted into the final binary. This section is merely information for the
linker, comprising of additional linker directives.
llvm-svn: 209273
Although the previous code would construct a bundle and add the correct elements
to it, it would not finalise the bundle. This resulted in the InternalRead
markers not being added to the MachineOperands nor, more importantly, the
externally visible defs to the bundle itself. So, although the bundle was not
exposing the def, the generated code would be correct because there was no
optimisations being performed. When optimisations were enabled, the post
register allocator would kick in, and the hazard recognizer would reorder
operations around the load which would define the value being operated upon.
Rather than manually constructing the bundle, simply construct and finalise the
bundle via the finaliseBundle call after both MIs have been emitted. This
improves the code generation with optimisations where IMAGE_REL_ARM_MOV32T
relocations are emitted.
The changes to the other tests are the result of the bundle generation
preventing the scheduler from hoisting the moves across the loads. The net
effect of the generated code is equivalent, but, is much more identical to what
is actually being lowered.
llvm-svn: 209267
for undefined symbols, so it matches what COFFObjectFile::getSymbolAddress
does. This allows llvm-nm to print spaces instead of 0’s for the value
of undefined symbols in Mach-O files.
To make this change other uses of MachOObjectFile::getSymbolAddress
are updated to handle when the Value is returned as UnknownAddressOrSize.
Which is needed to keep two of the ExecutionEngine tests working for example.
llvm-svn: 209253
This reverts commit r209178.
This seems to be asserting in an LTO build on some internal Apple
buildbots. No upstream reproduction (and I don't have an LLVM-aware gold
built right now to reproduce it personally) but it's a small patch & the
failure's semi-plausible so I'm going to revert first while I try to
reproduce this.
llvm-svn: 209251
Povray and dealII currently assert with "Overran sorted position" in
AssignTopologicalOrder. The problem is that performPostLD1Combine can
introduce cycles.
Consider:
(insert_vector_elt (INSERT_SUBREG undef,
(load (add %vreg0, Constant<8>), undef), <= A
TargetConstant<2>),
(load %vreg0, undef), <= B
Constant<1>)
This is turned into a LD1LANEpost node. However the address in A is not a
valid user of the post-incremented address of B in LD1LANEpost.
llvm-svn: 209242
Undecided whether this should include a test case - SROA produces bad
dbg.value metadata describing a value for a reference that is actually
the value of the thing the reference refers to. For now, loosening the
assert lets this not assert, but it's still bogus/wrong output...
If someone wants to tell me to add a test, I'm willing/able, just
undecided. Hopefully we'll get SROA fixed soon & we can tighten up this
assertion again.
llvm-svn: 209240
make the functions to set them non-static.
Move and rename the llvm specific backend options to avoid conflicting
with the clang option.
Paired with a backend commit to update.
llvm-svn: 209238
This commit introduces a canonical representation for the formulae.
Basically, as soon as a formula has more that one base register, the scaled
register field is used for one of them. The register put into the scaled
register is preferably a loop variant.
The commit refactors how the formulae are built in order to produce such
representation.
This yields a more accurate, but still perfectible, cost model.
<rdar://problem/16731508>
llvm-svn: 209230
This change preserves the original algorithm of generating history
for user variables, but makes it more clear.
High-level description of algorithm:
Scan all the machine basic blocks and machine instructions in the order
they are emitted to the object file. Do the following:
1) If we see a DBG_VALUE instruction, add it to the history of the
corresponding user variable. Keep track of all user variables, whose
locations are described by a register.
2) If we see a regular instruction, look at all the registers it clobbers,
and terminate the location range for all variables described by these registers.
3) At the end of the basic block, terminate location ranges for all
user variables described by some register.
Although this change shouldn't be user-visible (the contents of .debug_loc section
should be the same), it changes some internal assumptions about the set
of instructions used to track the variable locations. Watching the bots.
llvm-svn: 209225
In refactoring DwarfUnit::isUnsignedDIType I restricted it to only work
on values with signedness (unsigned or signed), asserting on anything
else (which did uncover some bugs). But it turns out that we do need to
emit constants of signless data, such as pointer constants - only null
pointer constants are known to need this so far, but it's conceivable
that there might be non-null pointer constants at some point (hardcoded
address offsets for device drivers?).
This patch just uses 'unsigned' for signless data such as pointer
constants. Arguably we could use signless representations
(DW_FORM_dataN) instead, allowing a trinary result from isUnsignedDIType
(signed, unsigned, signless), but this seems reasonable for now.
llvm-svn: 209223
The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend. We matched an immediate offset with STWX8 even though it only
supports register offset.
The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.
Many thanks to Bill Schmidt for testing this.
llvm-svn: 209219
After discussion with Zoran, we have decided to temporarily revert this commit.
It's causing some difficult to resolve conflicts and we are under time pressure
to deliver an initial MIPS64r6 compiler.
We will re-apply an equivalent patch once the time pressure has passed.
llvm-svn: 209211
This allows the results of a ComplexPattern check to be distributed to separate
named Operands, instead of the current system where all results must apply (and
match perfectly) with a single Operand.
For example, if "some_addrmode" is a ComplexPattern producing two results, you
can write:
def : Pat<(load (some_addrmode GPR64:$base, imm:$offset)),
(INST GPR64:$base, imm:$offset)>;
This should allow neater instruction definitions in TableGen that don't put all
possible aspects of addressing into a single operand, but are still usable with
relatively simple C++ CodeGen idioms.
llvm-svn: 209206
When multiple aliases overlap, the correct string to print can often be
determined purely by considering the InstAlias declarations in some particular
order. This allows the user to specify that order manually when desired,
without resorting to hacking around with the default lexicographical order on
Record instantiation, which is error-prone and ugly.
I was also mistaken about "add w2, w3, w4" being the same as "add w2, w3, w4,
uxtw". That's only true if Rn is the stack pointer.
llvm-svn: 209199
According to Intel Software Optimization Manual on Silvermont in some cases LEA
is better to be replaced with ADD instructions:
"The rule of thumb for ADDs and LEAs is that it is justified to use LEA
with a valid index and/or displacement for non-destructive destination purposes
(especially useful for stack offset cases), or to use a SCALE.
Otherwise, ADD(s) are preferable."
Differential Revision: http://reviews.llvm.org/D3826
llvm-svn: 209198
This workaround (presumably for ancient GDB) doesn't appear to be
required (GDB 7.5 seems to tolerate function definition DIEs in
namespace scope just fine).
llvm-svn: 209189
Since we visit the whole list of subprograms for each CU at module
start, this is clearly true - don't test for the case, just assert it.
A few old test cases seemed to have incomplete subprogram lists, but any
attempt to reproduce them shows full subprogram lists that even include
entities that have been completely inlined and the out of line
definition removed.
llvm-svn: 209178
When I refactored this in r208636 I accidentally caused this to be added
multiple times to each abstract subprogram (not accounting for the
deduplicating effect of the InlinedSubprogramDIEs set).
This got better in r208798 when the abstract definitions got the
attribute added to them at construction time, but still had the
redundant copies introduced in r208636.
This commit removes those excess DW_AT_inlines and relies solely on the
insertion in r208798.
llvm-svn: 209166
The check in DwarfDebug::constructScopeDIE was meant to consider inlined
subroutines as any non-top-level scope that was a subprogram. Instead of
checking "not top level scope" it was checking if the /subprogram's/
scope was non-top-level.
Fix this and beef up a test case to demonstrate some of the missing
inlined_subroutines are no longer missing.
In the course of fixing this I also found that r208748 (with this fix)
found one /extra/ inlined_subroutine in concrete_out_of_line.ll due to
two inlined_subroutines having the same inlinedAt location. The previous
implementation was collapsing these into a single inlined subroutine.
I'm not sure what the original code was that created this .ll file so
I'm not sure if this actually happens in practice today. Since we
deliberately include column information to disambiguate two calls on the
same line, that may've addressed this bug in the frontend, but it's good
to know that workaround isn't necessary for this particular case
anymore.
llvm-svn: 209165
Currently the X86 backend doesn't support types larger than i128 very well. For
example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted.
This fix changes the cost model to never hoist constants for types larger than
i128. Once the codegen issues have been resolved, the cost model can be updated
to allow also larger types.
This is related to <rdar://problem/16954938>
llvm-svn: 209162
Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always
provide the operand size as output if the input operand is zero.
We can take advantage of this knowledge during instruction selection
stage in order to simplify a few corner case.
llvm-svn: 209159
so that llvm-size will total up all the sections in the Berkeley format. This
allows for rough categorizations for Mach-O sections. And allows the total of
llvm-size’s Berkeley and System V formats to be the same.
llvm-svn: 209158
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.
Also added tests for SSE4.1, AVX, and AVX2.
Reviewers: delena, nadav, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3581
llvm-svn: 209156
For GOT relocations the addend should modify the offset to the
GOT entry, not the value of the entry itself. Teach RuntimeDyldMachO
to do The Right Thing here.
Fixes <rdar://problem/16961886>.
llvm-svn: 209154
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
llvm-svn: 209123
Rather than create a series of function calls to setup the library calls, create
a table with the information and just use the table to drive the configuration
of the library calls. This makes it easier to both inspect the list as well as
to modify it. NFC.
llvm-svn: 209089
While there make getOption return a const reference so we don't have to put it
on the stack when calling methods on it. No functionality change.
llvm-svn: 209088
Windows on ARM uses R11 for the frame pointer even though the environment is a
pure Thumb-2, thumb-only environment. Replicate this behaviour to improve
Windows ABI compatibility. This register is used for fast stack walking, and
thus is part of the Windows ABI.
llvm-svn: 209085
Use the ARMBaseRegisterInfo to query the frame register. The base register info
is aware of the frame register that is used for the frame pointer. Use that to
determine the frame register rather than duplicating the knowledge. Although,
the code path is slightly different in that it may return SP, that can only
occur if the frame pointer has been omitted in the machine function, which is
supposed to contain the desired value in that case.
llvm-svn: 209084
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern. This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions. No functional change beyond the removal of the old
constructors are intended.
llvm-svn: 209082
This is a preliminary step to help ease the construction of CallLoweringInfo.
Changing the construction to a chained function pattern requires that the
parameter be nullable. However, rather than copying the vector, save a pointer
rather than the reference to permit a late binding of the arguments.
llvm-svn: 209080
This patch fixes 3 issues introduced by r209049 that only showed up in on
the sanitizer buildbots. One was a typo in a compare. The other is a check to
confirm that the single differing value in the two incoming GEPs is the same
type. The final issue was the the IRBuilder under some circumstances would
build PHIs in the middle of the block.
llvm-svn: 209065
WoA uses COFF, not ELF. ARMISelLowering::createTLOF would previously return ELF
for any non-MachO platform. This was a missed site when the original change for
target format support for Windows on ARM was done.
llvm-svn: 209057
were added in SSE2, no SSSE3. Found this while auditing all uses of
SSSE3 in the X86 target. I don't actually expect this to make
a significant difference on anything and I don't have any detailed test
cases but I updated the existing test cases that already covered some of
this code path.
llvm-svn: 209056
Change --functions option in llvm-symbolizer tool to accept
values "none", "short" or "linkage". Update the tests and docs
accordingly.
llvm-svn: 209050
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
rdar://15547484
llvm-svn: 209049
vselects with constant masks, after legalization, will get turned into
specialized shuffle_vectors so they can be matched to blend+imm
instructions.
Fixed some tests.
llvm-svn: 209044
LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the
condition is constant and we can emit that instruction, given the
subtarget.
This is not enough for all cases. An additional SELECTCombine optimization
will be committed.
Fixed tests that were expecting variable blends but where a blend+imm can
be generated.
Added test where we can't emit blend+immediate.
Added avx2 blend+imm tests.
llvm-svn: 209043
No functionality change intended. The types that previously were set to
lower as Expand or Legal are doing the same thing with this lowering
function.
llvm-svn: 209042
This will allow us to use a single MachineInstr to represent
instructions which behave the same but have different encodings
on some subtargets.
llvm-svn: 209028
This allows us to put dynamic initializers for weak data into the same
comdat group as the data being initialized. This is necessary for MSVC
ABI compatibility. Once we have comdats for guard variables, we can use
the combination to help GlobalOpt fire more often for weak data with
guarded initialization on other platforms.
Reviewers: nlewycky
Differential Revision: http://reviews.llvm.org/D3499
llvm-svn: 209015
I'm not sure this is how it'll be going forward (I'd rather prefer the
definition to be in the main SP mapping, for various reasons) but this
helps me understand how it is today.
llvm-svn: 209009
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.
To avoid changing all alias related tests in this patches, I kept the common
syntax
@foo = alias i32* @bar
to mean the same as now. The cases that used to use cast now use the more
general syntax
@foo = alias i16, i32* @bar.
Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.
For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.
One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.
A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.
llvm-svn: 209007
DIBuilder maintains this invariant and the current DwarfDebug code could
end up doing weird things if it contained declarations (such as putting
the definition DIE inside a CU that contained the declaration - this
doesn't seem like a good idea, so rather than adding logic to handle
this case we'll just ban in for now & cross that bridge if we come to
it later).
llvm-svn: 209004
Summary:
Analyze the range of values produced by ashr/lshr cst, %V when it is
being used in an icmp.
Reviewers: nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3774
llvm-svn: 209000
Summary:
The dividend in an sdiv tells us the largest and smallest possible
results. Use this fact to optimize comparisons against an sdiv with a
constant dividend.
Reviewers: nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3795
llvm-svn: 208999
Now the only method to configure ELF section's content and size is to assign
a hexadecimal string to the `Content` field. Unfortunately this way is
completely useless when you need to declare a really large section.
To solve this problem this patch adds one more optional field `Size`
to the `RawContentSection` structure. When yaml2obj generates an ELF file
it uses the following algorithm:
1. If both `Content` and `Size` fields are missed create an empty section.
2. If only `Content` field is missed take section length from the `Size`
field and fill the section by zero.
3. If only `Size` field is missed create a section using data from
the `Content` field.
4. If both `Content` and `Size` fields are provided validate that the `Size`
value is not less than size of `Content` data. Than take section length
from the `Size`, fill beginning of the section by `Content` and the rest
by zero.
Examples
--------
* Create a section 0x10000 bytes long filled by zero
Name: .data
Type: SHT_PROGBITS
Flags: [ SHF_ALLOC ]
Size: 0x10000
* Create a section 0x10000 bytes long starting from 'CA' 'FE' 'BA' 'BE'
Name: .data
Type: SHT_PROGBITS
Flags: [ SHF_ALLOC ]
Content: CAFEBABE
Size: 0x10000
The patch reviewed by Michael Spencer.
llvm-svn: 208995
This reverts commit r208934.
The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.
The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.
llvm-svn: 208978
Patch replaces old isEquivalentGEP implementation, and changes type of
comparison result from bool (equal or not) to {-1, 0, 1} (less, equal, greater).
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 208976
Patch replaces old isEquivalentOperation implementation, and changes type of
comparison result from bool (equal or not) to {-1, 0, 1} (less, equal, greater).
This patch belongs to patch series that improves MergeFunctions
performance time from O(N*N) to O(N*log(N)).
llvm-svn: 208973
TableGen has a fairly dubious heuristic to decide whether an alias should be
printed: does the alias have lest operands than the real instruction. This is
bad enough (particularly with no way to override it), but it should at least be
calculated consistently for both strings.
This patch implements that logic: first get the *correct* string for the
variant, in the same way as the Matcher, without guessing; then count the
number of whitespace chars.
There are basically 4 changes this brings about after the previous
commits; all of these appear to be good, so I have changed the tests:
+ ARM64: we print "neg X, Y" instead of "sub X, xzr, Y".
+ ARM64: we skip implicit "uxtx" and "uxtw" modifiers.
+ Sparc: we print "mov A, B" instead of "or %g0, A, B".
+ Sparc: we print "fcmpX A, B" instead of "fcmpX %fcc0, A, B"
llvm-svn: 208969
The canonical syntax is "fcmXY ..., #0.0".
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208968
This alias appears not to have an appropriate PrintMethod. Normally, I'd look
into it, but since AArch64 is disappearing soon it's probably not worth it.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208967
These aliases are handled entirely in C++ and only having TableGen InstAliases
for some of them was confusing LLVM.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208966
Certainly not without having a custom PrintMethod to invert the immediate
beforehand. But probably not at all.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208964
In AT&T syntax, we should probably print the full "movl" or "movw". TableGen
used to ignore these aliases because it was miscounting the number of operands.
This fixes the issue.
This will be tested when the TableGen "should I print this Alias"
heuristic is fixed (very soon).
llvm-svn: 208963
Actually, MOV sometimes is canonical, but for now this is a better
approximation than what's there.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208962
You can perform (say) an fcmle operation by swapping the operands on an fcmge,
but it shouldn't be printed like that.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208961
We accept "ldr w3, [x1, #-1]" as a convenience, but we should still print the
canonical "ldur" form.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208960
If an ANDS instruction has Rd == ZR it should be printed as TST since
its only effect is on the flags register NZCV.
This will be tested when the TableGen "should I print this Alias"
heuristic is fixed (very soon).
llvm-svn: 208959
MOV is almost always the right thing to print if possile. People understand it.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208958
For example, the full instruction "sub w0, wzr, w1, uxtw" could print as either
"neg w0, w1" or "sub w0, wzr, w1". The former is better.
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208957
You can write "lslv w0, w1, w2" (probably for legacy reasons), but it should be
printed as simply "lsl".
This will be tested when the TableGen "should I print this Alias" heuristic is
fixed (very soon).
llvm-svn: 208956
Add some Windows on ARM specific library calls. These are provided by msvcrt,
and can be used to perform integer to floating-point conversions (and
vice-versa) mirroring similar functions in the RTABI.
llvm-svn: 208949
Sometimes a LLVM compilation may take more time then a client would like to
wait for. The problem is that it is not possible to safely suspend the LLVM
thread from the outside. When the timing is bad it might be possible that the
LLVM thread holds a global mutex and this would block any progress in any other
thread.
This commit adds a new yield callback function that can be registered with a
context. LLVM will try to yield by calling this callback function, but there is
no guaranteed frequency. LLVM will only do so if it can guarantee that
suspending the thread won't block any forward progress in other LLVM contexts
in the same process.
Once the client receives the call back it can suspend the thread safely and
resume it at another time.
Related to <rdar://problem/16728690>
llvm-svn: 208945
Allow multiple raw profiles to coexist in a single .profraw file,
given the following conditions:
- Zero padding at the end of or between profiles will be skipped.
- Each profile must start with a valid header.
- Mixing endianness or pointer sizes in concatenated profiles files is
not allowed.
This is needed to handle cases where a program's shared libraries are
profiled as well as the main executable itself, as we'll need to emit
each executable's counters. Combining the tables in the runtime would
be expensive for the instrumented program.
rdar://16918688
llvm-svn: 208938
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.
For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.
llvm-svn: 208934
Since type units in the dwo file are handled by a debug aware tool, they
don't need to leverage the ELF comdat grouping to implement
deduplication. Avoid creating all the .group sections for these as a
space optimization.
llvm-svn: 208930
It is more appropriate than the current situation, when one flag
(AbsoluteFilePath) is relevant only if another flag is set.
This refactoring would also simplify fetching the short function name
(stored in DW_AT_name) instead of a linkage name returned currently.
No functionality change.
llvm-svn: 208921
The allocas going out of scope are immediately killed by the return
instruction.
This is a resend of r208912, which was committed accidentally.
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D3792
llvm-svn: 208920
We have to iterate over all the calls that were inlined to find out if
any were musttail.
Sink another variable down to where its used.
llvm-svn: 208913
The allocas going out of scope are immediately killed by the return
instruction.
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D3630
llvm-svn: 208912
The interesting case is what happens when you inline a musttail call
through a musttail call site. In this case, we can't break perfect
forwarding or allow any stack growth.
Instead of merging control flow from the inlined return instruction
after a musttail call into the body of the caller, leave the inlined
return instruction in the caller so that the musttail call stays in the
tail position.
More work is required in http://reviews.llvm.org/D3630 to handle the
case where the inlined function has dynamic allocas or byval arguments.
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D3491
llvm-svn: 208910
Added target specific combine rules to fold blend intrinsics according
to the following rules:
1) fold(blend A, A, Mask) -> A;
2) fold(blend A, B, <allZeros>) -> A;
3) fold(blend A, B, <allOnes>) -> B.
Added two new tests to verify that the new folding rules work for all
the optimized blend intrinsics.
llvm-svn: 208895
We now use SReg_* for integer types and VReg_* for floating-point types.
This should help simplify the SIFixSGPRCopies pass and no longer causes
ISel to insert a COPY after termiator instuctions that output a value.
This change is covered by exisitng tests.
llvm-svn: 208888
Previously, TableGen assumed that every aliased operand consumed precisely 1
MachineInstr slot (this was reasonable because until a couple of days ago,
nothing more complicated was eligible for printing).
This allows a couple more ARM64 aliases to print so we can remove the special
code.
On the X86 side, I've gone for explicit AT&T size specifiers as the default, so
turned off a few of the aliases that would have just started printing.
llvm-svn: 208880
In all cases, if a "mov" alias exists, it is the canonical form of the
instruction. Now that TableGen can support aliases containing syntax variants,
we can enable them and improve the quality of the asm output.
llvm-svn: 208874
Previously, we ignored the difference between V64 and V128 when parsing
assembly: they both got mapped to registers in the FPR128 class. This is
basically harmless at the moment because they both print and encode the same
way. However, it will affect the printing of aliases.
llvm-svn: 208866
Summary:
No support for symbols in place of the immediate yet since it requires new
relocations.
Depends on D3671
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3689
llvm-svn: 208858
much more effectively when trying to constant fold a load of a constant.
Previously, we only handled bitcasts by trying to find a totally generic
byte representation of the constant and use that. Now, we look through
the bitcast to see what constant we might fold the load into, and then
try to form a constant expression cast of the found value that would be
equivalent to loading the value.
You might wonder why on earth this actually matters. Well, turns out
that the Itanium ABI causes us to create a single array for a vtable
where the first elements are virtual base offsets, followed by the
virtual function pointers. Because the array is homogenous the element
type is consistently i8* and we inttoptr the virtual base offsets into
the initial elements.
Then constructors bitcast these pointers to i64 pointers prior to
loading them. Boom, no more constant folding of virtual base offsets.
This is the first fix to LLVM to address the *insane* performance Eric
Niebler discovered with Clang on his range comprehensions[1]. There is
more to come though, this doesn't *really* fix the problem fully.
[1]: http://ericniebler.com/2014/04/27/range-comprehensions/
llvm-svn: 208856
Summary:
They aren't implemented for any ISA at the moment.
Depends on D3670
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3671
llvm-svn: 208855
if ((x & C) == 0) x |= C becomes x |= C
if ((x & C) != 0) x ^= C becomes x &= ~C
if ((x & C) == 0) x ^= C becomes x |= C
if ((x & C) != 0) x &= ~C becomes x &= ~C
if ((x & C) == 0) x &= ~C becomes nothing
Z3 Verifications code for above transform
http://rise4fun.com/Z3/Pmsh
Differential Revision: http://reviews.llvm.org/D3717
llvm-svn: 208848
Summary:
This gets rid of a sub instruction by moving the negation to the
constant when valid.
Reviewers: nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3773
llvm-svn: 208827
Abstract variables should never have/use locations. In this case the
data wasn't used, so no functional change intended here, just
simplification.
llvm-svn: 208820
Many old tests using prior schemas still had some brokenness here (both
indirect arrays and arrays with single bogus elements). Fixed those up
so they don't hit the new assertions.
Also reduced nesting in some places, etc.
llvm-svn: 208817
This is just unneccessary - we only create abstract definitions when
we're inlining anyway, so there's no reason to delay this to see if
we're going to inline anything.
llvm-svn: 208798
If the function has the landingpad instruction, then the
handlerdata should be emitted even if the function has
nouwnind attribute. Otherwise, following code will not
work:
void test1() noexcept {
try {
throw_exception();
} catch (...) {
log_unexpected_exception();
}
}
Since the cantunwind was incorrectly emitted and the
LSDA is not available.
llvm-svn: 208791
For example
tzcntl %edi, %ebx
testl %edi, %edi
je .label
can be rewritten into
tzcntl %edi, %ebx
jb .label
A minor complication is that tzcnt sets CF instead of ZF when the input
is zero, we have to rewrite users of the flags from ZF to CF. Currently
we recognize patterns using lzcnt, tzcnt and popcnt.
Differential Revision: http://reviews.llvm.org/D3454
llvm-svn: 208788
Summary:
Also use named constants for common opcode fields.
Depends on D3669
Reviewers: vmedic, zoran.jovanovic, jkolek
Reviewed By: jkolek
Differential Revision: http://reviews.llvm.org/D3670
llvm-svn: 208784
Most importantly, it gives debug location info to the coverage callback.
This change also removes 2 cases of unnecessary setDebugLoc when IRBuilder
is created with the same debug location.
llvm-svn: 208767
The ELF header e_flags field in the MIPS related test cases handled
incorrectly. The obj2yaml prints too many flags. I will fix that in the
next patches.
The patch reviewed by Michael Spencer and Sean Silva.
llvm-svn: 208752
The UDF instruction is a reserved undefined instruction space. The assembler
mnemonic was introduced with ARM ARM rev C.a. The instruction is not predicated
and the immediate constant is ignored by the CPU. Add support for the three
encodings for this instruction.
The changes to the invalid instruction test is due to the fact that the invalid
instructions actually overlap with the undefined instruction. Introduction of
the new instruction results in a partial decode as an undefined sequence. Drop
the tests as they are invalid instruction patterns anyways.
llvm-svn: 208751
This was reverted in r208642 due to regressions surrounding file changes
within lexical scopes causing inlining information to be lost.
The issue was in LexicalScopes::getOrCreateInlinedScope, where I was
previously testing "isLexicalBlock" which is false for
"DILexicalBlockFile" (a scope used to represent changes in the current
file name) and assuming it was then a function (breaking out of the
inlined scope path and reaching for the parent non-inlined scopes). By
inverting the condition and testing for "isSubprogram" the correct
behavior is attained.
(also found some weirdness in Clang, see r208742 when reducing this test
case - the resulting test case doesn't apply with the Clang fix, but
I've added a more realistic test case to inline-scopes.ll which does
reproduce the issue and demonstrate the fix)
llvm-svn: 208748