Commit Graph

108146 Commits

Author SHA1 Message Date
Juergen Ributzka 0616d9d41a [FastISel][AArch64] Factor out scale factor calculation. NFC.
Factor out the code that determines the implicit scale factor of memory
operations for a given value type.

llvm-svn: 218652
2014-09-30 00:49:54 +00:00
Nick Kledzik 5ffacc1655 [llvm-objdump] switch some uses of format() to format_hex() and left_justify()
llvm-svn: 218649
2014-09-30 00:19:58 +00:00
Eric Christopher a2db922c0e Simplify conditional.
llvm-svn: 218643
2014-09-29 23:31:13 +00:00
Adam Nemet 6bddb8c3a5 [AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAs
No functionality change.

Makes the code more compact (see the FMA part).

This needs a new type attribute MemOpFrag in X86VectorVTInfo.  For now I only
defined this in the simple cases.  See the commment before the attribute.

Diff of X86.td.expanded before and after is empty except for the appearance of
the new attribute.

llvm-svn: 218637
2014-09-29 22:54:41 +00:00
Hans Wennborg f26bfc1671 WinCOFFObjectWriter: optimize the string table for common suffices
This is a follow-up from r207670 which did the same for ELF.

Differential Revision: http://reviews.llvm.org/D5530

llvm-svn: 218636
2014-09-29 22:43:20 +00:00
Eric Christopher 6a0551e43a Add soft-float to the key for the subtarget lookup in the TargetMachine
map, this makes sure that we can compile the same code for two different
ABIs (hard and soft float) in the same module.

Update one testcase accordingly (and fix some confusing naming) and
add a new testcase as well with the ordering swapped which would
highlight the problem.

llvm-svn: 218632
2014-09-29 21:57:54 +00:00
Eric Christopher 9b270d4dc9 Fix spelling and reflow comments.
llvm-svn: 218631
2014-09-29 21:57:52 +00:00
Dave Estes 5f9daea101 [AArch64] Refines the Cortex-A57 Machine Model
Primarily refines all of the instructions with accurate latency
and micro-op information. Refinements largely focus on the NEON
instructions.

Additionally, a few advanced features are modeled, including
forwarding for MAC instructions and hazards for floating point SQRT
and DIV.

Lastly, the issue-width is reduced to three so that the scheduler
will better accommodate the narrower decode and dispatch width.

llvm-svn: 218627
2014-09-29 21:27:36 +00:00
David Blaikie ce3f573ae8 Unit test r218187, changing RTDyldMemoryManager::getSymbolAddress's behavior favor mangled lookup over unmangled lookup.
The contract of this function seems problematic (fallback in either
direction seems like it could produce bugs in one client or another),
but here's some tests for its current behavior, at least. See the
commit/review thread of r218187 for more discussion.

llvm-svn: 218626
2014-09-29 21:25:13 +00:00
Aaron Ballman be8ce197aa Fixing the build for compilers which do not yet have support for constexpr functions, NFC.
llvm-svn: 218622
2014-09-29 20:27:01 +00:00
Jordan Rose 59e4e1b5fe Add getValueOr to llvm::Optional<T>.
This takes a single argument convertible to T, and
- if the Optional has a value, returns the existing value,
- otherwise, constructs a T from the argument and returns that.

Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.

llvm-svn: 218618
2014-09-29 18:56:08 +00:00
Jordan Rose 40424cd6ab Add "typedef T value_type;" to llvm::Optional<T>.
Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.

llvm-svn: 218617
2014-09-29 18:56:05 +00:00
Matt Arsenault 9f617a0cb2 Fixing missing C++ mode comment
llvm-svn: 218612
2014-09-29 15:55:18 +00:00
Matt Arsenault 1fd0c62821 Fix include order
llvm-svn: 218611
2014-09-29 15:53:15 +00:00
Matt Arsenault 9783e00d1e R600/SI: Fix hardcoded values for modifiers.
Move enums to SIDefines.h

llvm-svn: 218610
2014-09-29 15:50:26 +00:00
Matt Arsenault 3d4233fe48 R600/SI: Also fix fsub + fadd a, a to mad combines
llvm-svn: 218609
2014-09-29 14:59:38 +00:00
Matt Arsenault 02cb0ff7db R600/SI: Fix using mad with multiplies by 2
These turn into fadds, so combine them into the target
mad node.

fadd (fadd (a, a), b) -> mad 2.0, a, b

llvm-svn: 218608
2014-09-29 14:59:34 +00:00
Chad Rosier 70d54ac848 [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

llvm-svn: 218607
2014-09-29 13:59:31 +00:00
Frederic Riss 312a02e193 Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a single DWARFUnitSection.
There will be multiple TypeUnits in an unlinked object that will be extracted
from different sections. Now that we have DWARFUnitSection that is supposed
to represent an input section, we need a DWARFUnitSection<TypeUnit> per
input .debug_types section.

Once this is done, the interface is homogenous and we can move the Section
parsing code into DWARFUnitSection.

This is a respin of r218513 that got reverted because it broke some builders.
This new version features an explicit move constructor for the DWARFUnitSection
class to workaround compilers unable to generate correct C++11 default
constructors.

Reviewers: samsonov, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5482

llvm-svn: 218606
2014-09-29 13:56:39 +00:00
Kevin Qin fc02e3c363 Use a loop to simplify the runtime unrolling prologue.
Runtime unrolling will create a prologue to execute the extra
iterations which is can't divided by the unroll factor. It
generates an if-then-else sequence to jump into a factor -1
times unrolled loop body, like

    extraiters = tripcount % loopfactor
    if (extraiters == 0) jump Loop:
    if (extraiters == loopfactor) jump L1
    if (extraiters == loopfactor-1) jump L2
    ...
    L1:  LoopBody;
    L2:  LoopBody;
    ...
    if tripcount < loopfactor jump End
    Loop:
    ...
    End:

It means if the unroll factor is 4, the loop body will be 7
times unrolled, 3 are in loop prologue, and 4 are in the loop.
This commit is to use a loop to execute the extra iterations
in prologue, like

        extraiters = tripcount % loopfactor
        if (extraiters == 0) jump Loop:
        else jump Prol
 Prol:  LoopBody;
        extraiters -= 1                 // Omitted if unroll factor is 2.
        if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
        if (tripcount < loopfactor) jump End
 Loop:
 ...
 End:

Then when unroll factor is 4, the loop body will be copied by
only 5 times, 1 in the prologue loop, 4 in the original loop.
And if the unroll factor is 2, new loop won't be created, just
as the original solution.

llvm-svn: 218604
2014-09-29 11:15:00 +00:00
Oliver Stannard a4eba5ad70 [Thumb2] ldrexd and strexd are not defined on v7M
The Thumb2 ldrexd and strexd instructions are not defined for
M-class architectures.

llvm-svn: 218603
2014-09-29 10:57:29 +00:00
Chandler Carruth 6cbf43167b [x86] Make the new vector shuffle lowering lower blends as VSELECT
nodes, and rely exclusively on its logic. This removes a ton of
duplication from the blend lowering and centralizes it in one place.

One downside is that it requires a bunch of hacks to make this work with
the current legalization framework. We have to manually speculate one
aspect of legalizing VSELECT nodes to get everything to work nicely
because the existing legalization framework isn't *actually* bottom-up.

The other grossness is that we somewhat duplicate the analysis of
constant blends. I'm on the fence here. If reviewers thing this would
look better with VSELECT when it has constant operands dumping over tho
VECTOR_SHUFFLE, we could go that way. But it would be a substantial
change because currently all of the actual blend instructions are
matched via patterns in the TD files based around VSELECT nodes (despite
them not being perfect fits for that). Suggestions welcome, but at least
this removes the rampant duplication in the backend.

llvm-svn: 218600
2014-09-29 09:57:07 +00:00
Jyoti Allur b76b57fefd Remove dead code from DIBuilder
llvm-svn: 218593
2014-09-29 06:32:54 +00:00
Chandler Carruth b1cc7a8542 [x86] Delete a bunch of really bad and totally unnecessary code in the
X86 target-specific DAG combining that tried to convert VSELECT nodes
into VECTOR_SHUFFLE nodes that it "knew" would lower into
immediate-controlled blend nodes.

Turns out, we have perfectly good lowering of all these VSELECT nodes,
and indeed that lowering already knows how to handle lowering through
BLENDI to immediate-controlled blend nodes. The code just wasn't getting
used much because this thing forced the world to go through the vector
shuffle lowering. Yuck.

This also exposes that I was too aggressive in avoiding domain crossing
in v218588 with that lowering -- when the other option is to expand into
two 128-bit vectors, it is worth domain crossing. Restore that behavior
now that we have nice tests covering it.

The test updates here fall into two camps. One is where previously we
ended up with an unsigned encoding of the blend operand and now we get
a signed encoding. In most of those places there were elaborate comments
explaining exactly what these operands really mean. Rather than that,
just switch these tests to use the nicely decoded comments that make it
obvious that the final shuffle matches.

The other updates are just removing pointless domain crossing by
blending integers with PBLENDW rather than BLENDPS.

llvm-svn: 218589
2014-09-29 02:01:20 +00:00
Chandler Carruth d639c7a829 [x86] Refactor all of the VSELECT-as-blend lowering code to avoid domain
crossing and generally work more like the blend emission code in the new
vector shuffle lowering.

My goal is to have the new vector shuffle lowering just produce VSELECT
nodes that are either matched here to BLENDI or are legal and matched in
the .td files to specific blend instructions. That seems much cleaner as
there are other ways to produce a VSELECT anyways. =]

No *observable* functionality changed yet, mostly because this code
appears to be near-dead. The behavior of this lowering routine did
change though. This code being mostly dead and untestable will change
with my next commit which will also point some new tests at it.

llvm-svn: 218588
2014-09-29 01:32:54 +00:00
Chandler Carruth 2f9e56e527 [x86] Improve naming and comments for VSELECT lowering.
No functionality changed.

llvm-svn: 218586
2014-09-29 00:51:58 +00:00
Chandler Carruth c7129276cd [x86] Add the dispatch skeleton to the new vector shuffle lowering for
AVX-512.

There is no interesting logic yet. Everything ends up eventually
delegating to the generic code to split the vector and shuffle the
halves. Interestingly, that logic does a significantly better job of
lowering all of these types than the generic vector expansion code does.
Mostly, it lets most of the cases fall back to nice AVX2 code rather
than all the way back to SSE code paths.

Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next
up will be to incrementally add direct support for the basic instruction
set to each type (adding tests first).

llvm-svn: 218585
2014-09-29 00:37:27 +00:00
Chandler Carruth 32a3ebda14 [x86] Make the split-and-lower routine fully generic by relaxing the
assertion, making the name generic, and improving the documentation.

Step 1 in adding very primitive support for AVX-512. No functionality
changed yet.

llvm-svn: 218584
2014-09-29 00:21:49 +00:00
Chandler Carruth 24e3b69cbd [x86] Teach the new vector shuffle lowering to fall back on AVX-512
vectors.

Someone will need to build the AVX512 lowering, which should follow
AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added
a dummy test which is a port of the v8f32 and v8i32 tests from AVX and
AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this
is enough information for someone to implement proper lowering here. If
not, I'll be happy to help, but right now the AVX-512 support isn't
a priority for me.

llvm-svn: 218583
2014-09-28 23:53:10 +00:00
Chandler Carruth abe742e8fb [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2
lowerings.

This was hopelessly broken. First, the x86 backend wants '-1' to be the
element value representing true in a boolean vector, and second the
operand order for VSELECT is backwards from the actual x86 instructions.
To make matters worse, the backend is just using '-1' as the true value
to get the high bit to be set. It doesn't actually symbolically map the
'-1' to anything. But on x86 this isn't quite how it works: there *only*
the high bit is relevant. As a consequence weird non-'-1' values like
0x80 actually "work" once you flip the operands to be backwards.

Anyways, thanks to Hal for helping me sort out what these *should* be.

llvm-svn: 218582
2014-09-28 23:23:55 +00:00
Matt Arsenault 93ffe58f90 Add MachineOperand::ChangeToFPImmediate and setFPImm
llvm-svn: 218579
2014-09-28 19:24:59 +00:00
Chandler Carruth 6578f9208b [x86] Fix a really silly bug that I introduced fixing another bug in the
new vector shuffle target DAG combines -- it helps to actually test for
the value you want rather than just using an integer in a boolean
context.

Have I mentioned that I loathe implicit conversions recently? :: sigh ::

llvm-svn: 218576
2014-09-28 06:11:04 +00:00
Chandler Carruth b10c6b8e9e [x86] Fix yet another bug in the new vector shuffle lowering's handling
of widening masks.

We can't widen a zeroing mask unless both elements that would be merged
are either zeroed or undef. This is the only way to widen a mask if it
has a zeroed element.

Also clean up the code here by ordering the checks in a more logical way
and by using the symoblic values for undef and zero. I'm actually torn
on using the symbolic values because the existing code is littered with
the assumption that -1 is undef, and moreover that entries '< 0' are the
special entries. While that works with the values given to these
constants, using the symbolic constants actually makes it a bit more
opaque why this is the case.

llvm-svn: 218575
2014-09-28 03:30:25 +00:00
Hans Wennborg ba80b5d43c WinCOFFObjectWriter.cpp: make write_uint32_le more efficient
llvm-svn: 218574
2014-09-28 00:22:27 +00:00
James Molloy 463db9a77c [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!

llvm-svn: 218569
2014-09-27 17:02:54 +00:00
Yaron Keren 7b4133ac81 Fix llvm::huge_valf multiple initializations with Visual C++.
llvm::huge_valf is defined in a header file, so it is initialized
multiple times in every compiled unit upon program startup.

With non-VC compilers huge_valf is set to a HUGE_VALF which the
compiler can probably optimize out.

With VC numeric_limits<float>::infinity() does not return a number
but a runtime structure member which therotically may change 
between calls so the compiler does not optimize out the 
initialization and it happens many times. It can be easily seen by 
placing a breakpoint on the initialization line.

This patch moves llvm::huge_valf initialization to a source file
instead of the header.

llvm-svn: 218567
2014-09-27 14:41:29 +00:00
Chandler Carruth f4b9e6b9d9 [x86] Fix yet another issue with widening vector shuffle elements.
I spotted this by inspection when debugging something else, so I have no
test case what-so-ever, and am not even sure it is possible to
realistically trigger the bug. But this is what was intended here.

llvm-svn: 218565
2014-09-27 08:40:33 +00:00
Craig Topper 5ed88de99b Update test case to match minor formatting change introduced in r218563.
llvm-svn: 218564
2014-09-27 05:36:53 +00:00
Craig Topper 5546f8c8cc Reduce code duplication a bit.
llvm-svn: 218563
2014-09-27 05:26:42 +00:00
Chandler Carruth 4d03be1717 [x86] Fix terrible bugs everywhere in the new vector shuffle lowering
and in the target shuffle combining when trying to widen vector
elements.

Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.

There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.

I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.

llvm-svn: 218562
2014-09-27 04:42:44 +00:00
Chandler Carruth 81e6b29f03 [x86] Flip the sentinel values used in the target shuffle mask decoding
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.

This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.

llvm-svn: 218561
2014-09-27 04:42:39 +00:00
Craig Topper 5996da2032 Fix TableGen -gen-disassembler output for bit fields with an offset.
This fixes bit assignments like this
Inst{7-0} = Foo{9-2}

Patch by Steve King.

llvm-svn: 218560
2014-09-27 04:38:02 +00:00
Sanjay Patel bdf1e38856 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
Richard Smith 2b91a7f80f Add LLVM_ENABLE_MODULES flag to CMake to enable building with C++ modules.
llvm-svn: 218551
2014-09-26 22:40:15 +00:00
David Majnemer 601327c4b9 llvm-vtabledump: Further simplification
Hoist out calls to getSection and getContents.  No functional change
intended.

llvm-svn: 218550
2014-09-26 22:32:19 +00:00
David Majnemer dac39857d6 Object: BSS/virtual sections don't have contents
Users of getSectionContents shouldn't try to pass in BSS or virtual
sections.  In all instances, this is a bug in the code calling this
routine.

N.B. Some COFF implementations (like CL) will mark their BSS sections as
taking space on disk.  This would confuse COFFObjectFile into thinking
the section is larger than the file.

llvm-svn: 218549
2014-09-26 22:32:16 +00:00
Yaron Keren abce3c4e18 clang-format of ChangeStdinToBinary & ChangeStdoutToBinary.
llvm-svn: 218547
2014-09-26 22:27:11 +00:00
Kevin Enderby 8597488e5e Update llvm-objdump’s Mach-O symbolizer code to print the name of symbol stubs.
So in fully linked images when a call is made through a stub it now gets a
comment like the following in the disassembly:

    callq	0x100000f6c             ## symbol stub for: _printf

indicating the call is to a symbol stub and which symbol it is for.  This is
done for branch reference types and seeing if the branch target is in a stub
section and if so using the indirect symbol table entry for that stub and
using that symbol table entries symbol name.

llvm-svn: 218546
2014-09-26 22:20:44 +00:00
Richard Smith e06ffe2c2d Remove definition of LLVM_VERSION_INFO; this macro is not used by any of the
files in this directory. If it should be defined anywhere, it should be defined
when building lib/LTO/LTOCodeGenerator.cpp, but we've not had it defined there
for quite some time, so that doesn't really seem to be very important. (It also
would slow down the modules build by creating extra module variants.)

llvm-svn: 218544
2014-09-26 21:53:12 +00:00
Richard Smith ca9ae10c1a Fix CMake warning CMP0054: don't quote a variable name that is intended to be
expanded; future versions of cmake may not expand the variable in this case.

llvm-svn: 218543
2014-09-26 21:35:48 +00:00