Commit Graph

31480 Commits

Author SHA1 Message Date
Daniel Sanders a39ef1c68f [mips] [IAS] Add support for the DLA pseudo-instruction and fix problems with DLI
Summary: It is the same as LA, except that it can also load 64-bit addresses and it only works on 64-bit MIPS architectures.

Reviewers: tomatabacu, seanbruno, vkalintiris

Subscribers: brooks, seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D9524

llvm-svn: 245208
2015-08-17 10:11:55 +00:00
Michael Kuperstein adc4e9c414 [GMR] isNonEscapingGlobalNoAlias() should look through Bitcasts/GEPs when looking at loads.
This fixes yet another case from PR24288.

Differential Revision: http://reviews.llvm.org/D12064

llvm-svn: 245207
2015-08-17 10:06:08 +00:00
James Molloy ef183397b1 Generate FMINNAN/FMINNUM/FMAXNAN/FMAXNUM from SDAGBuilder.
These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted.

For example on AArch32 (V8), we have scalar fminnm but not fmin.

Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with.

llvm-svn: 245196
2015-08-17 07:13:10 +00:00
Karthik Bhat 3af28945b9 Fix PR24469 resulting from r245025 and re-enable dead store elimination across basicblocks.
PR24469 resulted because DeleteDeadInstruction in handleNonLocalStoreDeletion was
deleting the next basic block iterator. Fixed the same by resetting the basic block iterator
post call to DeleteDeadInstruction.

llvm-svn: 245195
2015-08-17 05:51:39 +00:00
David Majnemer 8ed559ad22 Revert "[InstCombinePHI] Partial simplification of identity operations."
This reverts commit r244887, it caused PR24470.

llvm-svn: 245194
2015-08-17 03:11:26 +00:00
Chandler Carruth 2f1fd1658f [PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.

I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.

But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.

To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.

To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.

With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.

Differential Revision: http://reviews.llvm.org/D12063

llvm-svn: 245193
2015-08-17 02:08:17 +00:00
Sanjay Patel 57fd1dc5db transform fmin/fmax calls when possible (PR24314)
If we can ignore NaNs, fmin/fmax libcalls can become compare and select
(this is what we turn std::min / std::max into).

This IR should then be optimized in the backend to whatever is best for
any given target. Eg, x86 can use minss/maxss instructions.

This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314

Differential Revision: http://reviews.llvm.org/D11866

llvm-svn: 245187
2015-08-16 20:18:19 +00:00
David Majnemer e04443baff Revert "Add support for cross block dse. This patch enables dead stroe elimination across basicblocks."
This reverts commit r245025, it caused PR24469.

llvm-svn: 245172
2015-08-16 07:11:59 +00:00
David Majnemer dfa3b09541 [InstCombine] Replace an and+icmp with a trunc+icmp
Bitwise arithmetic can obscure a simple sign-test.  If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.

llvm-svn: 245171
2015-08-16 07:09:17 +00:00
David Majnemer 1a59e49f3c [X86] Widen the 'AND' mask if doing so shrinks the encoding size
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero.  This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.

This fixes PR24085.

Differential Revision: http://reviews.llvm.org/D11289

llvm-svn: 245169
2015-08-16 04:52:11 +00:00
Sanjay Patel 40d4eb40f6 [x86] enable machine combiner reassociations for scalar single-precision minimums
llvm-svn: 245166
2015-08-15 17:01:54 +00:00
Simon Pilgrim d65ace84c7 Updated broadcast stack folding test to avoid use of broadcast intrinsics.
llvm-svn: 245165
2015-08-15 16:54:18 +00:00
Sanjay Patel 3b7e3677e3 fix typos; NFC
llvm-svn: 245164
2015-08-15 16:53:08 +00:00
Sanjay Patel 9f6c7dddd2 add test case to show current codegen
llvm-svn: 245163
2015-08-15 16:49:50 +00:00
Simon Pilgrim 0750c84623 [DAGCombiner] Attempt to mask vectors before zero extension instead of after.
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.

(zext (truncate x)) -> (zext (and(x, m))

Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.

This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.

Differential Revision: http://reviews.llvm.org/D11764

llvm-svn: 245160
2015-08-15 13:27:30 +00:00
David Majnemer 0bc0eef71c [IR] Give catchret an optional 'return value' operand
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction.  CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.

llvm-svn: 245149
2015-08-15 02:46:08 +00:00
JF Bastien 5e4303dc14 Accelerate MergeFunctions with hashing
This patch makes the Merge Functions pass faster by calculating and comparing
a hash value which captures the essential structure of a function before
performing a full function comparison.

The hash is calculated by hashing the function signature, then walking the basic
blocks of the function in the same order as the main comparison function. The
opcode of each instruction is hashed in sequence, which means that different
functions according to the existing total order cannot have the same hash, as
the comparison requires the opcodes of the two functions to be the same order.

The hash function is a static member of the FunctionComparator class because it
is tightly coupled to the exact comparison function used. For example, functions
which are equivalent modulo a single variant callsite might be merged by a more
aggressive MergeFunctions, and the hash function would need to be insensitive to
these differences in order to exploit this.

The hashing function uses a utility class which accumulates the values into an
internal state using a standard bit-mixing function. Note that this is a different interface
than a regular hashing routine, because the values to be hashed are scattered
amongst the properties of a llvm::Function, not linear in memory. This scheme is
fast because only one word of state needs to be kept, and the mixing function is
a few instructions.

The main runOnModule function first computes the hash of each function, and only
further processes functions which do not have a unique function hash. The hash
is also used to order the sorted function set. If the hashes differ, their
values are used to order the functions, otherwise the full comparison is done.

Both of these are helpful in speeding up MergeFunctions. Together they result in
speedups of 9% for mysqld (a mostly C application with little redundancy), 46%
for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all
three cases, the new speed of MergeFunctions is about half that of the module
verifier, making it relatively inexpensive even for large LTO builds with
hundreds of thousands of functions. The same functions are merged, so this
change is free performance.

Author: jrkoenig

Reviewers: nlewycky, dschuff, jfb

Subscribers: llvm-commits, aemerson

Differential revision: http://reviews.llvm.org/D11923

llvm-svn: 245140
2015-08-15 01:18:18 +00:00
Matt Arsenault 427a0fd22e LoopStrengthReduce: Try to pass address space to isLegalAddressingMode
This seems to only work some of the time. In some situations,
this seems to use a nonsensical type and isn't actually aware of the
memory being accessed. e.g. if branch condition is an icmp of a pointer,
it checks the addressing mode of i1.

llvm-svn: 245137
2015-08-15 00:53:06 +00:00
Matt Arsenault 297ae311ce AMDGPU/SI: Fix printing useless info with amdhsa
The comments at the bottom would all report 0 if
amdhsa was used.

llvm-svn: 245135
2015-08-15 00:12:39 +00:00
Sanjay Patel 7332e0455f make current codegen visible in the checks, so we can decide if it's right
llvm-svn: 245120
2015-08-14 23:03:01 +00:00
Nick Lewycky 8075fd22b9 Fix a crash where a utility function wasn't aware of fcmp vectors and created a value with the wrong type. Fixes PR24458!
llvm-svn: 245119
2015-08-14 22:46:49 +00:00
Bjarke Hammersholt Roune 9791ed4705 [SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl
Summary:
http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions.

This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit:

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[i - offset];

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[i * stride];

  for (int i = 0; i < numIterations; ++i)
    sum += ptr[3 * (i << 7)];


Reviewers: atrick, sanjoy

Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben

Differential Revision: http://reviews.llvm.org/D11860

llvm-svn: 245118
2015-08-14 22:45:26 +00:00
Pat Gavlin b399095c3f Add a target environment for CoreCLR.
Although targeting CoreCLR is similar to targeting MSVC, there are
certain important differences that the backend must be aware of
(e.g. differences in stack probes, EH, and library calls).

Differential Revision: http://reviews.llvm.org/D11012

llvm-svn: 245115
2015-08-14 22:41:43 +00:00
Sanjay Patel dd175bc6c4 make current codegen visible in the checks, so we can decide if it's right
llvm-svn: 245108
2015-08-14 22:10:59 +00:00
Ahmed Bougacha cd35787217 [AArch64] Fix FMLS scalar-indexed-from-2s-after-neg patterns.
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.

llvm-svn: 245107
2015-08-14 22:06:05 +00:00
Evgeniy Stepanov 24ac55d884 [msan] Fix handling of musttail calls.
MSan instrumentation for return values of musttail calls is not
allowed by the IR constraints, and not needed at the same time.

llvm-svn: 245106
2015-08-14 22:03:50 +00:00
Alex Lorenz 577d271a75 MIR Serialization: Serialize the '.cfi_same_value' CFI directive.
llvm-svn: 245103
2015-08-14 21:55:58 +00:00
Alex Lorenz c3ba7508f6 MIR Serialization: Serialize the external symbol call entry pseudo source
values.

llvm-svn: 245098
2015-08-14 21:14:50 +00:00
Alex Lorenz 50b826fb75 MIR Serialization: Serialize the global value call entry pseudo source values.
llvm-svn: 245097
2015-08-14 21:08:30 +00:00
Renato Golin 980b6cc42b Revert "[ARM] Fix MachO CPU Subtype selection"
This reverts commit r245081, as it breaks many builds.

llvm-svn: 245086
2015-08-14 19:35:47 +00:00
Alex Lorenz 1039fd1ae5 MIR Serialization: Serialize the 'internal' register operand flag.
llvm-svn: 245085
2015-08-14 19:07:07 +00:00
Alex Lorenz f9a2b12361 MIR Serialization: Serialize the bundled machine instructions.
llvm-svn: 245082
2015-08-14 18:57:24 +00:00
Vedant Kumar 2f079be789 [ARM] Fix MachO CPU Subtype selection
This patch makes the Darwin ARM backend take advantage of TargetParser.  It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.

Differential Revision: http://reviews.llvm.org/D11996

llvm-svn: 245081
2015-08-14 18:36:47 +00:00
Sanjay Patel ed502905f7 [x86] fix allowsMisalignedMemoryAccess() implementation
This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly
return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load
merging optimization is changed to use the TLI hook. This exposes a shortcoming in the
current logic and results in the regression test update. Changing other direct users of
the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.

Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting
SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads
while PerformLOADCombine() splits them back into 16-byte loads.

Differential Revision: http://reviews.llvm.org/D10662

llvm-svn: 245075
2015-08-14 17:53:40 +00:00
Vedant Kumar 06f0678010 [test] Testing write access to llvm
llvm-svn: 245074
2015-08-14 17:42:50 +00:00
Reid Kleckner a57d015154 [sancov] Leave llvm.localescape in the entry block
Summary: Similar to the change we applied to ASan. The same test case works.

Reviewers: samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11961

llvm-svn: 245067
2015-08-14 16:45:42 +00:00
Chad Rosier 67dca908fe Cleanup test whitespace or lack thereof. NFC.
llvm-svn: 245065
2015-08-14 16:34:15 +00:00
Rafael Espindola dbaf0498a9 Revert "Centralize the information about which object format we are using."
This reverts commit r245047.

It was failing on the darwin bots. The problem was that when running

./bin/llc -march=msp430

llc gets to

  if (TheTriple.getTriple().empty())
    TheTriple.setTriple(sys::getDefaultTargetTriple());

Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.

That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.

llvm-svn: 245062
2015-08-14 15:48:41 +00:00
Davide Italiano c0ea1e6d63 Convert tests under MC/ELF from macho-dump to llvm-readobj.
Yet another step towards deprecating macho-dump.

llvm-svn: 245059
2015-08-14 15:16:37 +00:00
Rafael Espindola 90eb70c8a7 Centralize the information about which object format we are using.
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.

It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.

llvm-svn: 245047
2015-08-14 13:31:17 +00:00
Simon Pilgrim 1fdc177ccf Renamed min tests (typo)
llvm-svn: 245038
2015-08-14 11:03:31 +00:00
James Molloy 63be198712 [AArch64] FMINNAN/FMAXNAN on f16 is not legal.
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.

I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.

llvm-svn: 245035
2015-08-14 09:08:50 +00:00
David Majnemer b611e3f50e [IR] Add token types
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.

There are several applications for such a type but my immediate
motivation stems from WinEH.  Our personality routine enforces a
single-entry - single-exit regime for cleanups.  After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together.  We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.

Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup.  This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.

What is the burden to the optimizer?  Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway.  There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.

Differential Revision: http://reviews.llvm.org/D11861

llvm-svn: 245029
2015-08-14 05:09:07 +00:00
Karthik Bhat ddc2a86a00 Add support for cross block dse.
This patch enables dead stroe elimination across basicblocks.

Example:
define void @test_02(i32 %N) {
  %1 = alloca i32
  store i32 %N, i32* %1
  store i32 10, i32* @x
  %2 = load i32, i32* %1
  %3 = icmp ne i32 %2, 0
  br i1 %3, label %4, label %5

; <label>:4
  store i32 5, i32* @x
  br label %7

; <label>:5
  %6 = load i32, i32* @x
  store i32 %6, i32* @y
  br label %7

; <label>:7
  store i32 15, i32* @x
  ret void
}
In the above example dead store "store i32 5, i32* @x" is now eliminated.

Differential Revision: http://reviews.llvm.org/D11143

llvm-svn: 245025
2015-08-14 04:17:23 +00:00
Jingyue Wu 1238f341ba [SeparateConstOffsetFromGEP] sext(a)+sext(b) => sext(a+b) when a+b can't sign-overflow.
Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.

One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.

Reviewers: broune

Subscribers: jholewinski, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12016

llvm-svn: 245003
2015-08-14 02:02:05 +00:00
Alex Lorenz 5022f6bb81 MIR Serialization: Change MIR syntax - use custom syntax for MBBs.
This commit modifies the way the machine basic blocks are serialized - now the
machine basic blocks are serialized using a custom syntax instead of relying on
YAML primitives. Instead of using YAML mappings to represent the individual
machine basic blocks in a machine function's body, the new syntax uses a single
YAML block scalar which contains all of the machine basic blocks and
instructions for that function.

This is an example of a function's body that uses the old syntax:

    body:
      - id: 0
        name: entry
        instructions:
          - '%eax = MOV32r0 implicit-def %eflags'
          - 'RETQ %eax'
    ...

The same body is now written like this:

    body: |
      bb.0.entry:
        %eax = MOV32r0 implicit-def %eflags
        RETQ %eax
    ...

This syntax change is motivated by the fact that the bundled machine
instructions didn't map that well to the old syntax which was using a single
YAML sequence to store all of the machine instructions in a block. The bundled
machine instructions internally use flags like BundledPred and BundledSucc to
determine the bundles, and serializing them as MI flags using the old syntax
would have had a negative impact on the readability and the ease of editing
for MIR files. The new syntax allows me to serialize the bundled machine
instructions using a block construct without relying on the internal flags,
for example:

   BUNDLE implicit-def dead %itstate, implicit-def %s1 ... {
      t2IT 1, 24, implicit-def %itstate
      %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate
   }

This commit also converts the MIR testcases to the new syntax. I developed
a script that can convert from the old syntax to the new one. I will post the
script on the llvm-commits mailing list in the thread for this commit.

llvm-svn: 244982
2015-08-13 23:10:16 +00:00
Ahmed Bougacha 80e4ac802a [AArch64] Provide "too few operands" diags on short-form NEON also.
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.

Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.

llvm-svn: 244955
2015-08-13 21:09:13 +00:00
Alex Lorenz 6866104073 MIR Parser: Don't allow negative alignments for memory operands.
llvm-svn: 244953
2015-08-13 20:55:01 +00:00
Davide Italiano a195386ca1 [SimplifyLibCalls] Correctly set the is_zero_undef flag for llvm.cttz
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.

Code generated before the patch was applied:


 0:   0f bc c7                bsf    %edi,%eax
 3:   b9 20 00 00 00          mov    $0x20,%ecx
 8:   0f 45 c8                cmovne %eax,%ecx
 b:   83 c1 02                add    $0x2,%ecx
 e:   b8 01 00 00 00          mov    $0x1,%eax
13:   85 ff                   test   %edi,%edi
15:   0f 45 c1                cmovne %ecx,%eax
18:   c3                      retq

Code generated after the patch was applied:

 0:   0f bc cf                bsf    %edi,%ecx
 3:   83 c1 02                add    $0x2,%ecx
 6:   85 ff                   test   %edi,%edi
 8:   b8 01 00 00 00          mov    $0x1,%eax
 d:   0f 45 c1                cmovne %ecx,%eax
10:   c3                      retq

It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.

Differential Revision:  http://reviews.llvm.org/D11989	

llvm-svn: 244947
2015-08-13 20:34:26 +00:00
Simon Pilgrim 829091e310 [X86][SSE] Tests for SMAX/SMIN/UMAX/UMIN vector instructions
llvm-svn: 244944
2015-08-13 20:31:03 +00:00