This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.
Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.
Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.
The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.
Patch by Ranjeet Singh.
llvm-svn: 218521
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.
This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.
Differential Revision: http://reviews.llvm.org/D5370
Patch by Simon Pilgrim.
llvm-svn: 218517
This reverts commit r218513.
Buildbots using libstdc++ issue an error when trying to copy
SmallVector<std::unique_ptr<>>. Revert the commit until we have a fix.
llvm-svn: 218514
Summary:
There will be multiple TypeUnits in an unlinked object that will be extracted
from different sections. Now that we have DWARFUnitSection that is supposed
to represent an input section, we need a DWARFUnitSection<TypeUnit> per
input .debug_types section.
Once this is done, the interface is homogenous and we can move the Section
parsing code into DWARFUnitSection.
Reviewers: samsonov, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5482
llvm-svn: 218513
Summary:
This will allow us to handle f128 arguments without duplicating code from
CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands().
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5292
llvm-svn: 218509
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.
Things to note:
a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.
b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.
c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.
llvm-svn: 218492
code generation options from TargetMachine. This will depend
upon Function + TargetSubtargetInfo based code generation at
which point resetTargetOptions and this code can be removed.
llvm-svn: 218491
No functional change.
I initially thought that pulling the Pat<> into the instruction pattern was
not possible because it was doing a transform on the index in order to convert
it from a per-element (extract_subvector) index into a per-chunk (vextract*x4)
index.
Turns out this also works inside the pattern because the vextract_extract
PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the
index in $idx goes through the same conversion.
The existing test CodeGen/X86/avx512-insert-extract.ll extended in the
previous commit provides coverage for this change.
llvm-svn: 218480
No functional change.
These are now implemented as two levels of multiclasses heavily relying on the
new X86VectorVTInfo class. The multiclass at the first level that is called
with float or int provides the 128 or 256 bit subvector extracts. The second
level provides the register and memory variants and some more Pat<>s.
I've compared the td.expanded files before and after. One change is that
ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a
bugfix.
(BTW, this is the change that was blocked on the recent tablegen fix. The
class-instance values X86VectorVTInfo inside vextract_for_type weren't
properly evaluated.)
Part of <rdar://problem/17688758>
llvm-svn: 218478
Add SelectionDAG TableGen definitions for BR_CC so that targets can instruction-select
BR_CC using TableGen pattern matching.
Patch by deadal nix.
llvm-svn: 218476
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable. This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.
Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.
<rdar://problem/18021659>
llvm-svn: 218472
llvm::format() is somewhat unsafe. The compiler does not check that integer
parameter size matches the %x or %d size and it does not complain when a
StringRef is passed for a %s. And correctly using a StringRef with format() is
ugly because you have to convert it to a std::string then call c_str().
The cases where llvm::format() is useful is controlling how numbers and
strings are printed, especially when you want fixed width output. This
patch adds some new formatting functions to raw_streams to format numbers
and StringRefs in a type safe manner. Some examples:
OS << format_hex(255, 6) => "0x00ff"
OS << format_hex(255, 4) => "0xff"
OS << format_decimal(0, 5) => " 0"
OS << format_decimal(255, 5) => " 255"
OS << right_justify(Str, 5) => " foo"
OS << left_justify(Str, 5) => "foo "
llvm-svn: 218463
The InstrEmitter will skip the check of MI.hasPostISelHook()
before calling AdjustInstrPostInstrSelection() when NDEBUG
is not defined.
This was added in r140228, and I'm not sure if it is intentional or not,
but it is a likely source for bugs, because it means with
Release+Asserts builds you can forget to set the hasPostISelHook
flag on TableGen definitions and AdjustInstrPostInstrSelection() will
still be called.
llvm-svn: 218458
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).
This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.
Details of the optimization are discussed in D5091.
Test Plan: make check-all + a new test
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5422
llvm-svn: 218455
These instructions do not indicate they are extendable or the
number of bits in the extendable operand. Rename to match
architected names. Add a testcase for the intrinsics.
llvm-svn: 218453
Summary:
The N32/N64 ABI's require that structs passed in registers are laid out
such that spilling the register with 'sd' places the struct at the lowest
address. For little endian this is trivial but for big-endian it requires
that structs are shifted into the upper bits of the register.
We also require that structs passed in registers have the 'inreg'
attribute for big-endian N32/N64 to work correctly. This is because the
tablegen-erated calling convention implementation only has access to the
lowered form of struct arguments (one or more integers of up to 64-bits
each) and is unable to determine the original type.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5286
llvm-svn: 218451
On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with
the same type size. Adding that logic to the parser, and generating VBIC
instructions from VAND asm files.
This patch also fixes the validation routines for NEON splat immediates which
were wrong.
Fixes PR20702.
llvm-svn: 218450
v4f64 and v8f32 shuffles when they are lane-crossing. We have fully
general lane-crossing permutation functions in AVX2 that make this easy.
Part of this also changes exactly when and how these vectors are split
up when we don't have AVX2. This isn't always a win but it usually is
a win, so on the balance I think its better. The primary regressions are
all things that just need to be fixed anyways such as modeling when
a blend can be completely accomplished via VINSERTF128, etc.
Also, this highlights one of the few remaining big features: we do
a really poor job of inserting elements into AVX registers efficiently.
This completes almost all of the big tricks I have in mind for AVX2. The
only things left that I plan to add:
1) element insertion smarts
2) palignr and other fairly specialized lowerings when they happen to
apply
llvm-svn: 218449
256-bit vectors with lane-crossing.
Rather than immediately decomposing to 128-bit vectors, try flipping the
256-bit vector lanes, shuffling them and blending them together. This
reduces our worst case shuffle by a pretty significant margin across the
board.
llvm-svn: 218446
The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not
defined for v7M or v8A. It is defined for all other Thumb2-supporting
architectures (v6T2, v7A and v7R).
llvm-svn: 218445
lowering where it only used the mask of the low 128-bit lane rather than
the entire mask.
This allows the new lowering to correctly match the unpack patterns for
v8i32 vectors.
For reference, the reason that we check for the the entire mask rather
than checking the repeated mask is because the repeated masks don't
abide by all of the invariants of normal masks. As a consequence, it is
safer to use the full mask with functions like the generic equivalence
test.
llvm-svn: 218442
reduce the amount of checking we do here.
The first realization is that only non-crossing cases between 128-bit
lanes are handled by almost the entire function. It makes more sense to
handle the crossing cases first.
THe second is that until we actually are going to generate fancy shared
lowering strategies that use the repeated semantics of the v8i16
lowering, we should waste time checking for repeated masks. It is
simplest to directly test for the entire unpck masks anyways, so we
gained nothing from this.
This also matches the structure of v32i8 more closely.
No functionality changed here.
llvm-svn: 218441
lowering.
This completes the basic AVX2 feature support, but there are still some
improvements I'd like to do to really get the last mile of performance
here.
llvm-svn: 218440
I made a mistake in the previous commit and produced the wrong pattern.
Fix that. Also make one more shuffle pattern byte-based rather than
word-based, and add two more blend patterns.
llvm-svn: 218439
shuffles rather than word shuffles.
As you might guess, these were built starting from the word shuffle test
cases and I failed to properly port a bunch of them and left them as
widened word shuffle test cases. We still have a couple of tests that
check our ability to widen shuffles, but now we will test the actual
byte shuffle quite a bit better.
llvm-svn: 218438
Nico Rieck added support for this 32-bit COFF relocation some time ago
for Win64 stuff. It appears that as an oversight, the assembly output
used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse.
Sadly, there were actually tests that took in IMGREL and put out
IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can
assemble it's own output with slightly more fidelity.
llvm-svn: 218437
for this now.
Should prevent folks from running afoul of this and not knowing why
their code won't instruction select the way I just did...
llvm-svn: 218436
missing test cases for it.
Unsurprisingly, without test cases, there were bugs here. Surprisingly,
this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV.
It isn't wired to anything. Oops. I'll fix than next.
llvm-svn: 218434
If we have multiple coverage counts for the same segment, we need to
add them up rather than arbitrarily choosing one. This fixes that and
adds a test with template instantiations to exercise it.
llvm-svn: 218432
lowering.
This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.
Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.
llvm-svn: 218430
get the literal string “Hello world” printed as a comment on the instruction
that loads the pointer to it. For now this is just for x86_64. So for object
files with relocation entries it produces things like:
leaq L_.str(%rip), %rax ## literal pool for: "Hello world\n"
and similar for fully linked images like executables:
leaq 0x4f(%rip), %rax ## literal pool for: "Hello world\n"
Also to allow testing against darwin’s otool(1), I hooked up the existing
-no-show-raw-insn option to the Mach-O parser code, added the new Mach-O
only -full-leading-addr option to match otool(1)'s printing of addresses and
also added the new -print-imm-hex option.
llvm-svn: 218423
For biendian targets like ARM and AArch64, it is useful to have the
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.
Patch by Charlie Turner.
llvm-svn: 218408
This change fixes the ARM and AArch64 relocation visitors in
RelocVisitor. They were unconditionally assuming the object data are
little-endian. Tests have been added to ensure that the
llvm-dwarfdump utility does not crash when processing big-endian
object files.
Patch by Charlie Turner.
llvm-svn: 218407
This change replaces the brittle if/else chain of string comparisons
with a switch statement on the detected target triple, removing the
need for testing arbitrary architecture names returned from
getFileFormatName, whose primary purpose seems to be for display
(user-interface) purposes. The visitor now takes a reference to the
object file, rather than its arbitrary file format name to figure out
whether the file is a 32 or 64-bit object file and what the detected
target triple is.
A set of tests have been added to help show that the refactoring processes
relocations for the same targets as the original code.
Patch by Charlie Turner.
llvm-svn: 218406
Use the same environment when invoking llvm-config from lit.cfg as
will be used when running tests, so that ASAN_OPTIONS, INCLUDE, etc.
are present.
llvm-svn: 218403
into unblended shuffles and a blend.
This is the consistent fallback for the lowering paths that have fast
blend operations available, and its getting quite repetitive.
No functionality changed.
llvm-svn: 218399
This reverts commit faac033f7364bb4226e22c8079c221c96af10d02.
The test depends on all targets to be enabled in llc in order to pass,
and needs to be rewritten/refactored to not have that dependency.
llvm-svn: 218393
For biendian targets like ARM and AArch64, it is useful to have the
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.
Patch by Charlie Turner.
llvm-svn: 218391
This change fixes the ARM and AArch64 relocation visitors in
RelocVisitor. They were unconditionally assuming the object data are
little-endian. Tests have been added to ensure that the
llvm-dwarfdump utility does not crash when processing big-endian
object files.
Patch by Charlie Turner.
llvm-svn: 218389
This change replaces the brittle if/else chain of string comparisons
with a switch statement on the detected target triple, removing the
need for testing arbitrary architecture names returned from
getFileFormatName, whose primary purpose seems to be for display
(user-interface) purposes. The visitor now takes a reference to the
object file, rather than its arbitrary file format name to figure out
whether the file is a 32 or 64-bit object file and what the detected
target triple is.
A set of tests have been added to help show that the refactoring processes
relocations for the same targets as the original code.
Patch by Charlie Turner.
llvm-svn: 218388
The doFinalization method checks that the LoopToAliasSetMap is
empty. LICM populates that map as it runs through the loop nest,
deleting the entries for child loops as it goes. However, if a child
loop is deleted by another pass (e.g. unrolling) then the loop will
never be deleted from the map because LICM walks the loop nest to
find entries it can delete.
The fix is to delete the loop from the map and free the alias set
when the loop is deleted from the loop nest.
Differential Revision: http://reviews.llvm.org/D5305
llvm-svn: 218387
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.
This is effectively a (heavily bug-fixed) rewrite of r208992.
llvm-svn: 218386
v7M only allows the 16-bit encoding of the 'cps' (Change Processor
State) instruction, and does not have the 32-bit encoding which is
valid from v6T2 onwards.
llvm-svn: 218382
pool data being loaded into a vector register.
The comments take the form of:
# ymm0 = [a,b,c,d,...]
# xmm1 = <x,y,z...>
The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.
My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.
Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.
llvm-svn: 218377
attempt didn't work out so well. It looks like it will be much better
for introducing extra logic to find a shuffle mask if the finding logic
is totally separate. This also makes it easy to sink the opcode logic
completely out of the routine so we don't re-dispatch across it.
Still no functionality changed.
llvm-svn: 218363
asm. This can be somewhat expensive and there is no reason to do it
outside of tests or debugging sessions. I'm also likely to make it
significantly more expensive to support more styles of shuffles.
llvm-svn: 218362
from the MachineInstr into the caller which is already doing a switch
over the instruction.
This will make it more clear how to compute different operands to feed
the comment selection for example.
Also, in a drive-by-fix, don't append an empty comment string (which is
a no-op ultimately).
No functionality changed.
llvm-svn: 218361
vector shuffles.
This is just the beginning by hoisting it into its own function and
making use of early exit to dramatically simplify the flow of the
function. I'm going to be incrementally refactoring this until it is
a bit less magical how this applies to other instructions, and I can
teach it how to dig a shuffle mask out of a register. Then I plan to
hook it up to VPERMD so we get our mask comments for it.
No functionality changed yet.
llvm-svn: 218357
The previous implementation was extending the live range of SGPRs
by modifying the live intervals directly. This was causing a lot
of machine verification errors when the machine scheduler was enabled.
The new implementation adds pseudo instructions with implicit uses to
extend the live ranges of SGPRs, which works much better.
llvm-svn: 218351
Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO,
VCC_HI, and M0. The previous implementation would assertion fail
when passed these registers.
llvm-svn: 218349
VGPRs are spilled to LDS. This still needs more testing, but
we need to at least enable it at -O0, because the fast register
allocator spills all registers that are live at the end of blocks
and without this some future commits will break the
flat-address-space.ll test.
v2: Only calculate thread id once
v3: Move insertion of spill instructions to
SIRegisterInfo::eliminateFrameIndex()
llvm-svn: 218348
the native AVX2 instructions.
Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.
llvm-svn: 218347
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.
Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.
llvm-svn: 218346
This testcase was not testing what it meant: because there were only two checks for
dmb {{ish}} in the second function, it could have missed a bug where one of the three
required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added
CHECK-LABELs to make it a bit less brittle.
llvm-svn: 218341
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.
Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.
llvm-svn: 218338
Rather than slurping in and splatting out the whole ctor list, preserve
the existing array entries without trying to understand them. Only
remove the entries that we know we can optimize away. This way we don't
need to wire through priority and comdats or anything else we might add.
Fixes a linker issue where the .init_array or .ctors entry would point
to discarded initialization code if the comdat group from the TU with
the faulty global_ctors entry was dropped.
llvm-svn: 218337
e.g., add w1, w2, w3, lsl #(2 - 1)
This sort of thing comes up in pre-processed assembly playing macro games.
Still validate that it's an assembly time constant. The early exit error check
was just a bit overzealous and disallowed a left paren.
rdar://18430542
llvm-svn: 218336
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.
Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.
llvm-svn: 218335
There are new register classes VCSrc_* which represent operands that
can take an SGPR, VGPR or inline constant. The VSrc_* class is now used
to represent operands that can take an SGPR, VGPR, or a 32-bit
immediate.
This allows us to have more accurate checks for legality of
immediates, since before we had no way to distinguish between operands
that supported any 32-bit immediate and operands which could only
support inline constants.
llvm-svn: 218334
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.
Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).
Test Plan: make check-all (lots of tests for this functionality already exist)
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5404
llvm-svn: 218332
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.
I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.
Test Plan: new test + make check-all
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5180
llvm-svn: 218331
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.
In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.
If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.
This should not cause any functionnal change so the existing tests are used
and not modified.
Test Plan: make check-all, benefits from existing tests of atomics on ARM
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5179
llvm-svn: 218329
VPBLENDD where appropriate even on 128-bit vectors.
According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.
Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.
llvm-svn: 218322
This patch removes the old JIT memory manager (which does not provide any
useful functionality now that the old JIT is gone), and migrates the few
remaining clients over to SectionMemoryManager.
http://llvm.org/PR20848
llvm-svn: 218316
The function deleteBody() converts the linkage to external and thus destroys
original linkage type value. Lack of correct linkage type causes wrong
relocations to be emitted later.
Calling dropAllReferences() instead of deleteBody() will fix the issue.
Differential Revision: http://reviews.llvm.org/D5415
llvm-svn: 218302
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.
A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.
The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.
llvm-svn: 218301
trick that I missed.
VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.
However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.
There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.
llvm-svn: 218300
This includes constants, attributes, and some additional instructions not covered by previous tests.
Work was done by lama.saba@intel.com.
llvm-svn: 218297
We manage to generate all of the matching instructions (and a lot more) via
the reciprocal optimization function - even if we completely remove the square
root optimization. With CHECK_NEXT, we assure that we're executing the
expected square root optimization paths and not generating extra insts.
llvm-svn: 218284
td pattern). Currently we only model the immediate operand variation of
VPERMILPS and VPERMILPD, we should make that clear in the pseudos used.
Will be adding support for the variable mask variant in my next commit.
llvm-svn: 218282
Comparing ModuleName to the file names listed will
always fail.
I wonder how this code ever worked and what its
purpose was. Why exclude the msvc runtime DLLs
but not exclude all Windows system DLLs?
Anyhow, it does not function as intended.
clang-formatted as well.
llvm-svn: 218276
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.
This should fix a bug found by Chad.
llvm-svn: 218275
Summary:
This fixes a couple of issues. One is ensuring that AOK_Label rewrite
rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to
be able to skip the brackets properly. The other part of the fix ensures
that we don't overwrite Identifier when looking up the identifier, and
that we use the locally available information to generate the AOK_Label
rewrite in ParseIntelIdentifier. Doing that in CreateMemForInlineAsm
would be problematic since the Start location there may point to the
beginning of a bracket expression, and not necessarily the beginning of
an identifier.
This also means that we don't need to carry around the InternlName field,
which helps simplify the code.
Test Plan: This will be tested on the clang side.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5445
llvm-svn: 218270
These are just test cases, no actual code yet. This establishes the
baseline fallback strategy we're starting from on AVX2 and the expected
lowering we use on AVX1.
Also, these test cases are very much generated. I've manually crafted
the specific pattern set that I'm hoping will be useful at exercising
the lowering code, but I've not (and could not) manually verify *all* of
these. I've spot checked and they seem legit to me.
As with the rest of vector shuffling, at a certain point the only really
useful way to check the correctness of this stuff is through fuzz
testing.
llvm-svn: 218267
Summary:
r218229 made this function return a dummy nullptr in order to avoid
API breakage between clang/llvm.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5432
llvm-svn: 218266
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels,
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.
The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.
Differential Revision: http://reviews.llvm.org/D5347
llvm-svn: 218263
This reverts commit r218254.
The global_atomics.ll test fails with asserts disabled. For some reason,
the compiler fails to produce the atomic no return variants.
llvm-svn: 218257
BypassSlowDiv is used by codegen prepare to insert a run-time
check to see if the operands to a 64-bit division are really 32-bit
values and if they are it will do 32-bit division instead.
This is not useful for R600, which has predicated control flow since
both the 32-bit and 64-bit paths will be executed in most cases. It
also increases code size which can lead to more instruction cache
misses.
llvm-svn: 218252
ISD::MUL and ISD:UMULO are the same except that UMULO sets an overflow
bit. Since we aren't using the overflow bit, we should use ISD::MUL.
llvm-svn: 218251
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.
Test Plan: tests updated with x32 target
Reviewers: nadav, rafael, dschuff
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D5245
llvm-svn: 218247
Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example.
Reviewers: samsonov, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5394
llvm-svn: 218245
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector). The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics. The
instruction only uses the least significant 4 bits. This change removes the
assert and masks the index to match the instruction behaviour.
llvm-svn: 218242
We currently emit an error when trying to assemble a file with more
than one section using DWARF2 debug info. This should be a warning
instead, as the resulting file will still be usable, but with a
degraded debug illusion.
llvm-svn: 218241
As of July 2014, all backends have been updated to implement
AtomicRMWInst::Nand as ~(x & y) (and not as x & ~y, as some did previously).
This was added to the release notes in r212635 (and the LangRef had been
changed), but it seems that we forgot to update the header-file description.
llvm-svn: 218236
for LVI algorithm. For a specific value to be lowered, when the number of basic
blocks being checked for overdefined lattice value is larger than
lvi-overdefined-BB-threshold, or the times of encountering overdefined value
for a single basic block is larger than lvi-overdefined-threshold, the LVI
algorithm will stop further lowering the lattice value.
llvm-svn: 218231
The implementation of the callback in clang's Sema will return an
internal name for labels.
Test Plan: Will be tested in clang.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4587
llvm-svn: 218229
a more sane approach to AVX2 support.
Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.
The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.
Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.
Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.
llvm-svn: 218228
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.
I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.
llvm-svn: 218226
in the high and low 128-bit lanes of a v8f32 vector.
No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]
llvm-svn: 218224
This is purely a plumbing patch. No functional changes intended.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.
Next steps:
The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.
Differential Revision: http://reviews.llvm.org/D5425
llvm-svn: 218219
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.
This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.
This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.
llvm-svn: 218216
a generic vector shuffle mask into a helper that isn't specific to the
other things that influence which choice is made or the specific types
used with the instruction.
No functionality changed.
llvm-svn: 218215
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.
llvm-svn: 218214
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.
With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...
llvm-svn: 218213
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.
This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.
llvm-svn: 218211
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.
llvm-svn: 218210
v8f32 shuffles in the new vector shuffle lowering code.
This is very cheap to do and makes it much more clear that anything more
expensive but overlapping with this lowering should be selected
afterward (for example using AVX2's VPERMPS). However, no functionality
changed here as without this code we would fall through to create no-op
shuffles of each input and a blend. =]
llvm-svn: 218209
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.
llvm-svn: 218208
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.
llvm-svn: 218206
awkward conditions. The readability improvement of this will be even
more important as I generalize it to handle more types.
No functionality changed.
llvm-svn: 218205
128-bit lane crossings, not 'half' crossings. This came up in code
review ages ago, but I hadn't really addresesd it. Also added some
documentation for the helper.
No functionality changed.
llvm-svn: 218203
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.
The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.
llvm-svn: 218202
link.exe:
Fuzz testing has shown that COMMON symbols with size > 32 will always
have an alignment of at least 32 and all symbols with size < 32 will
have an alignment of at least the largest power of 2 less than the size
of the symbol.
binutils:
The BFD linker essentially work like the link.exe behavior but with
alignment 4 instead of 32. The BFD linker also supports an extension to
COFF which adds an -aligncomm argument to the .drectve section which
permits specifying a precise alignment for a variable but MC currently
doesn't support editing .drectve in this way.
With all of this in mind, we decide to play a little trick: we can
ensure that the alignment will be respected by bumping the size of the
global to it's alignment.
llvm-svn: 218201
under AVX.
This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.
This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]
llvm-svn: 218199
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.
llvm-svn: 218193
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.
llvm-svn: 218190
We had a few bugs:
- We were considering the GVKind instead of just looking at the section
characteristics
- We would never print out 'y' when a section was meant to be unreadable
- We would never print out 's' when a section was meant to be shared
- We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant
IMAGE_SCN_LNK_REMOVE
llvm-svn: 218189
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.
While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.
llvm-svn: 218188
This patch modifies RTDyldMemoryManager::getSymbolAddress(Name)'s behavior to
make it consistent with how clients are using it: Name should be mangled, and
getSymbolAddress should demangle it on the caller's behalf before looking the
name up in the process. This patch also fixes the one client
(MCJIT::getPointerToFunction) that had been passing unmangled names (by having
it pass mangled names instead).
Background:
RTDyldMemoryManager::getSymbolAddress(Name) has always used a re-try mechanism
when looking up symbol names in the current process. Prior to this patch
getSymbolAddress first tried to look up 'Name' exactly as the user passed it in
and then, if that failed, tried to demangle 'Name' and re-try the look up. The
implication of this behavior is that getSymbolAddress expected to be called with
unmangled names, and that handling mangled names was a fallback for convenience.
This is inconsistent with how clients (particularly the RuntimeDyldImpl
subclasses, but also MCJIT) usually use this API. Most clients pass in mangled
names, and succeed only because of the fallback case. For clients passing in
mangled names, getSymbolAddress's old behavior was actually dangerous, as it
could cause unmangled names in the process to shadow mangled names being looked
up.
For example, consider:
foo.c:
int _x = 7;
int x() { return _x; }
foo.o:
000000000000000c D __x
0000000000000000 T _x
If foo.c becomes part of the process (E.g. via dlopen("libfoo.dylib")) it will
add symbols 'x' (the function) and '_x' (the variable) to the process. However
jit clients looking for the function 'x' will be using the mangled function name
'_x' (note how function 'x' appears in foo.o). When getSymbolAddress goes
looking for '_x' it will find the variable instead, and return its address and
in place of the function, leading to JIT'd code calling the variable and
crashing (if we're lucky).
By requiring that getSymbolAddress be called with mangled names, and demangling
only when we're about to do a lookup in the process, the new behavior
implemented in this patch should eliminate any chance of names being shadowed
during lookup.
There's no good way to test this at the moment: This issue only arrises when
looking up process symbols (not JIT'd symbols). Any test case would have to
generate a platform-appropriate dylib to pass to llvm-rtdyld, and I'm not
aware of any in-tree tool for doing this in a portable way.
llvm-svn: 218187
This splits the logic for actually looking up coverage information
from the logic that displays it. These were tangled rather thoroughly
so this change is a bit large, but it mostly consists of moving things
around. The coverage lookup logic itself now lives in the library,
rather than being spread between the library and the tool.
llvm-svn: 218184
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable. Marking the
section as writable interferes with section merging.
This fixes PR21009.
llvm-svn: 218179
tricky case of single-element insertion into the zero lane of a zero
vector.
We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.
This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?
llvm-svn: 218177
analysis used elsewhere. This removes the last duplicate of this logic.
Also simplify the code here quite a bit. No functionality changed.
llvm-svn: 218176
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.
This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.
llvm-svn: 218175
vector lanes can be modeled as zero with a call to the new function that
computes a bit-vector representing that information.
No functionality changed here, but will allow doing more clever things
with the zero-test.
llvm-svn: 218174
I just tried reproducing some of the optimization failures in README.txt in the
X86 backend, and many of them could not be reproduced. In general the entire
file appears quite bit-rotted, whatever interesting parts remain should be
moved to bugzilla, and the rest deleted. I did not spend the time to do that,
so I just deleted the few I tried reproducing which are obsolete, to save some
time to whoever will find the courage to do it.
llvm-svn: 218170
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.
There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.
llvm-svn: 218161
In r217636, the value stored in KernelInfo.Num[VS]GPRSs was changed from
the highest GPR index used to the number of gprs in order to be
consistent with the name of the variable.
The code writing the config values still assumed that the value in this
variable was the highest GPR index used, which caused the compiler to
over report the number of GPRs being used.
https://bugs.freedesktop.org/show_bug.cgi?id=84089
llvm-svn: 218150
Summary: This is part of the overall goal of removing static initializers from LLVM.
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D5416
llvm-svn: 218149
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.
Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.
llvm-svn: 218143
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).
Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.
Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)
Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).
llvm-svn: 218129
Summary:
This will allow to request the creation of a forward delacred variable
at is point of use (for imported declarations, this will be
DwarfDebug::constructImportedEntityDIE) rather than having to put the
forward decl in a retention list.
Note that getOrCreateGlobalVariable returns the actual definition DIE when the
routine creates a declaration and a definition DIE. If you agree this is the
right behavior, then I'll have a followup patch that registers the definition
in the DIE map instead of the declaration as it is today (this 'breaks' only
one test, where we test that the imported entity is the declaration). I'm
not sure what's best here, but it's easy enough for a consumer to follow the
DW_AT_specification link to get to the declaration, whereas it takes more
work to find the actual definition from a declaration DIE.
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5381
llvm-svn: 218126
Summary:
getFileNameForUnit() is basically a wrapper around LineTable::getFileNameByIndex().
Fold its additional functionality (adding the DWARFUnit compilation dir) into
LineTable::getFileNameByIndex().
getFileLineInfoForCompileUnit() is a wrapper around getFileNameForUnit(). As
a function to search the line information by address, it seems natural to put
it in the LineTable also.
Before this commit only the Context with its private helpers could do Linetable
lookups. This newly exposed feature will be used by the DIE dumping code to
get access to file information referenced in DIE attributes.
This commit has already been partly reviewed in D5192 and contained an
additional and a bit controversial 'realpath' call that is left out of this
patch. We can reinstate that realpath code later if it is desirable.
Test Plan:
The patch contains no tests as it should be functionally equivalent to the
previous code. As requested in the last review, I checked if the relative
path handling copied from the Context to LineTable::getFileNameByIndex()
was covered, and indeed the symbolizer tests fail if it is removed.
Reviewers: dblaikie, echristo, aprantl, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5354
llvm-svn: 218125
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.
Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.
Patch by Olivier Sallenave, thanks!
llvm-svn: 218120
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.
llvm-svn: 218115
Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2
ones because the shuffles are so absurd it's not worth transcribing
them. Will try to fix them to be sane and then check them.
llvm-svn: 218114
shuffles that are zext-ing.
Not a lot to see here; the undef lane variant is better handled with
pshufd, but this improves the actual zext pattern.
llvm-svn: 218112
The filename-equivalence flag allows you to show coverage when your
source files don't have the same full paths as those that generated
the data. This is mostly useful for writing tests in a cross-platform
way.
This wasn't triggering in cases where the filename was derived
directly from the coverage data, which meant certain types of test
case were impossible to write. This patch fixes that, and following
patches involve tests that need this.
llvm-svn: 218108
to the new vector shuffle lowering code.
This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.
One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.
llvm-svn: 218102
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.
llvm-svn: 218101
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).
llvm-svn: 218098
This format is simply a regular object file with the bitcode stored in a
section named ".llvmbc", plus any number of other (non-allocated) sections.
One immediate use case for this is to accommodate compilation processes
which expect the object file to contain metadata in non-allocated sections,
such as the ".go_export" section used by some Go compilers [1], although I
imagine that in the future we could consider compiling parts of the module
(such as large non-inlinable functions) directly into the object file to
improve LTO efficiency.
[1] http://golang.org/doc/install/gccgo#Imports
Differential Revision: http://reviews.llvm.org/D4371
llvm-svn: 218078
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).
<rdar://problem/18352998>
llvm-svn: 218076
- Replace std::unordered_map with DenseMap
- Use std::pair instead of manually combining two unsigneds
- Assert if insert is called with invalid arguments
- Avoid an unnecessary copy of a std::vector
llvm-svn: 218074
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;
Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.
Thanks to luqmana for having noticed this bug.
```
Test Plan: Added more cases to atomic-load-store.ll + make check-all
Reviewers: jfb, t.p.northover, luqmana
Subscribers: llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D5386
llvm-svn: 218066
The patch moved some logic around in an attempt to generate potentially more
DW_AT_declaration attributes. The patch was flawed though and it stopped
generating the attribute in some cases.
llvm-svn: 218060
Turns out Clang's -Woverloaded-virtual is enabled by -Wall in both CMake
and Configure builds. We were only explicitly specifying it (thus
enabling GCC's version of the warning) in the Configure build.
The specific case of interest is:
struct base {
virtual void func();
virtual void func(int);
};
struct derived: base {
virtual void func(); // GCC warns here, because this causes
// func(int) to be hidden
};
I don't think that's worth getting fussed about (& Clang (indirectly
me... since I improved this warning in Clang) agrees or we would've made
the warning catch these cases.
Technically this could still lead to bugs/confusion if base had
func(int) and func(bool), derived overrode func(bool) and then a caller
with a derived object tried to call func(42) - it would silently call
func(bool). We should probably improve clang's warnings to catch this at
the call site at some point.
llvm-svn: 218059
As suggested by David Blaikie, this may be easier to read.
The original warning was:
../tools/llvm-cov/llvm-cov.cpp:53:49: error: ISO C++ forbids zero-size array 'argv' [-Werror=pedantic]
std::string Invocation(std::string(argv[0]) + " " + argv[1]);
It seems to be the case that GCC's warning gets confused and thinks
'argv' is a declaration here. GCC bugzilla issue #61259.
llvm-svn: 218048
Summary:
This doesn't show up today as we don't emit decalration only variables. This
will be tested when the followup patches implementing import of forward
declared entities lands in clang.
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5382
llvm-svn: 218041
The current code is only able to return the right unit if the passed offset
is the exact offset of a section. Generalize the search function by comparing
againt the offset of the next unit instead and by switching the search
algorithm to upper_bound.
This way, the unit returned is the first unit with a getNextUnitOffset()
strictly greater than the searched offset, which is exactly what we want.
Note that there is no need for testing the range of the resulting unit as
the offsets of a DWARFUnitSection are in a single contiguous range from
0 inclusive to lastUnit->getNextUnitOffset() exclusive.
Reviewers: dblaikie samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5262
llvm-svn: 218040
There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.
There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.
llvm-svn: 218038
Summary:
The N32/N64 ABI's return f128 values in $f0 and $f2 for hard-float and $v0 and
$a0 for soft-float. The registers used in the soft-float case differ from the
usual $v0, and $v1 specified for return values.
Both cases were previously handled by duplicating the CCState::AnalyzeReturn()
and CCState::AnalyzeCallReturn() functions and modifying them to delegate to
a different assignment function for f128 and further replace the register type
for the hard-float case. There is a simpler way to do both of these.
We now use the common functions and select an initial assignment function based
on whether the original type is f128 or not. We then handle the hard-float case
using CCBitConvertToType<>.
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5269
llvm-svn: 218036
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.
llvm-svn: 218034
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.
llvm-svn: 218031
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.
and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]
llvm-svn: 218030
Certain directives are unsupported on Windows (some of which could/should be
supported). We would not diagnose the use but rather crash during the emission
as we try to access the Target Streamer. Add an assertion to prevent creating a
NULL reference (which is not permitted under C++) as well as a test to ensure
that we can diagnose the disabled directives.
llvm-svn: 218014
PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where
it is completely unmatched. It also introduces the logic for detecting
rotation shuffle masks even in the presence of single input or blend
masks and arbitrarily undef lanes.
I've added fairly comprehensive tests for the matching logic in v8i16
because the tests at that size are much easier to write and manage.
I've not checked the SSE2 code generated for these tests because the
code is *horrible*. It is absolute madness. Testing it will just make
the test brittle without giving any interesting improvements in the
correctness confidence.
llvm-svn: 218013
Rather than relying on support for a specific directive to determine if we are
targeting MachO, explicitly check the output format.
As an additional bonus, cleanup the caret diagnostic for the non-MachO case and
avoid the spurious error caused by not discarding the statement.
llvm-svn: 218012
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.
No functional change.
llvm-svn: 218004
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization.
llvm-svn: 217993