Commit Graph

27009 Commits

Author SHA1 Message Date
Quentin Colombet c32615dfef [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>

llvm-svn: 220978
2014-10-31 17:52:53 +00:00
Kostya Serebryany 001ea5fe15 [asan] fix caller-calee instrumentation to emit new cache for every call site
llvm-svn: 220973
2014-10-31 17:11:27 +00:00
Rafael Espindola 67926f10e8 Unify and update link-messages.ll and redefinition.ll. NFC.
llvm-svn: 220968
2014-10-31 16:52:30 +00:00
Chad Rosier a675e550ca [AArch64] CondOpt pass is missing FCMP instructions when searching backward for
a CMP which defines the flags used by B.CC.

http://reviews.llvm.org/D6047
Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>!

llvm-svn: 220961
2014-10-31 15:17:36 +00:00
Bradley Smith 9992b167ae [SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags
In a case where we have a no {un,}signed wrap flag on the increment, if
RHS - Start is constant then we can avoid inserting a max operation bewteen
the two, since we can statically determine which is greater.

This allows us to unroll loops such as:

 void testcase3(int v) {
   for (int i=v; i<=v+1; ++i)
     f(i);
 }

llvm-svn: 220960
2014-10-31 11:40:32 +00:00
Ulrich Weigand c8c2ea2854 [PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code
Since block address values can be larger than 2GB in 64-bit code, they
cannot be loaded simply using an @l / @ha pair, but instead must be
loaded from the TOC, just like GlobalAddress, ConstantPool, and
JumpTable values are.

The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where
temporary labels could not be used as TOC values, since code would
attempt (and fail) to use GetOrCreateSymbol to create a symbol of the
same name as the temporary label.

llvm-svn: 220959
2014-10-31 10:33:14 +00:00
Peter Zotov e2b8b1431c [OCaml] Ensure consistent naming.
Specifically:
  * Directories match module names.
  * Test names match module names.
  * The language is called "OCaml", not "Ocaml".

llvm-svn: 220958
2014-10-31 09:19:03 +00:00
Peter Zotov b1f54ff42f [OCaml] Rework Llvm_executionengine using ctypes.
Since JIT->MCJIT migration, most of the ExecutionEngine interface
became deprecated and/or broken. This especially affected the OCaml
bindings, as runFunction is no longer available, and unlike in C,
it is not possible to coerce a pointer to a function and call it
in OCaml.

In practice, LLVM 3.5 shipped completely unusable
Llvm_executionengine.

The GenericValue interface and runFunction were essentially
a poor man's FFI. As such, this interface was removed and instead
a dependency on ctypes >=0.3 added, which handled platform-specific
aspects of accessing data and calling functions.

The new interface does not expose JIT (which is a shim around MCJIT),
as well as the interpreter (which can't handle a lot of valid IR).

Llvm_executionengine.add_global_mapping is currently unusable
due to PR20656.

llvm-svn: 220957
2014-10-31 09:05:36 +00:00
Rafael Espindola 5b24759dff Move an input file to Inputs instead of using RUN: true.
llvm-svn: 220953
2014-10-31 05:54:15 +00:00
David Majnemer c7d7c6fb3a Object, COFF: Cleanup symbol type code, improve binutils compatibility
Do a better job classifying symbols.  This increases the consistency
between the COFF handling code and the ELF side of things.

llvm-svn: 220952
2014-10-31 05:07:00 +00:00
Rafael Espindola e5204efeaf merge tests for constant linking.
llvm-svn: 220951
2014-10-31 05:04:16 +00:00
Hao Liu e02b1a068f PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend.
Initial patch by Oleg Ranevskyy.

llvm-svn: 220945
2014-10-31 02:35:34 +00:00
Ahmed Bougacha 9f336c4ec5 [SelectionDAG] When scalarizing trunc, don't assert for legal operands.
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands.  On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.

It did that by choosing to widen v1 types whenever possible.  However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.

This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.  
This is similar to r205625.

Fixes PR20777.

llvm-svn: 220937
2014-10-30 23:46:50 +00:00
NAKAMURA Takumi d9913e6d35 llvm/test/Transforms/SampleProfile/syntax.ll: Relax MISSING-FILE not to
check locale-aware message catalog.

llvm-svn: 220934
2014-10-30 22:28:46 +00:00
Louis Gerbarg e8f9c78247 Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
Rafael Espindola d1e64b1e93 Fix the merging of the constantness of declarations.
The langref says:

LLVM explicitly allows declarations of global variables to be marked
constant, even if the final definition of the global is not. This
capability can be used to enable slightly better optimization of the
program, but requires the language definition to guarantee that
optimizations based on the ‘constantness’ are valid for the
translation units that do not include the definition.

Given that definition, when merging two declarations, we have to drop
constantness if of of them is not marked contant, since the Module
without the constant marker might not have the necessary guarantees.

llvm-svn: 220927
2014-10-30 20:50:23 +00:00
Philip Reames 4cb4d3e048 Add handling for range metadata in ValueTracking isKnownNonZero
If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes.  This helps to remove redundant checks and canonicalize checks for other optimization passes.  This particular patch checks whether a value is known to be non-zero from the range metadata.

Currently, these tests are against InstCombine.  In theory, all of these should be InstSimplify since we're not inserting any new instructions.  Moving the code may follow in a separate change.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5947

llvm-svn: 220925
2014-10-30 20:25:19 +00:00
Rafael Espindola f9a94abd4e Update test to pass .ll to llvm-link and use Inputs.
llvm-svn: 220924
2014-10-30 20:23:59 +00:00
David Blaikie 76fd43c653 PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site.
llvm-svn: 220923
2014-10-30 20:20:11 +00:00
Peter Zotov cf8f7a10b7 lit: PR21417: don't try to update OCAMLPATH if LibDir is empty.
llvm-svn: 220919
2014-10-30 19:26:42 +00:00
Diego Novillo c572e92c76 Add profile writing capabilities for sampling profiles.
Summary:
This patch finishes up support for handling sampling profiles in both
text and binary formats. The new binary format uses uleb128 encoding to
represent numeric values. This makes profiles files about 25% smaller.

The profile writer class can write profiles in the existing text and the
new binary format. In subsequent patches, I will add the capability to
read (and perhaps write) profiles in the gcov format used by GCC.

Additionally, I will be adding support in llvm-profdata to manipulate
sampling profiles.

There was a bit of refactoring needed to separate some code that was in
the reader files, but is actually common to both the reader and writer.

The new test checks that reading the same profile encoded as text or
raw, produces the same results.

Reviewers: bogner, dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6000

llvm-svn: 220915
2014-10-30 18:00:06 +00:00
Tim Northover a94cd25188 ARM: test default values for TAG_CPU_unaligned_access attribute.
It should be on for every target that supports unaligned accesses (e.g. not
v6m).

Patch by Charlie Turner.

llvm-svn: 220912
2014-10-30 17:05:44 +00:00
Robert Khasanov af318f7073 [AVX512] Added VBROADCAST{SS/SD} encoding for VL subset.
Refactored through AVX512_maskable
        

llvm-svn: 220908
2014-10-30 14:21:47 +00:00
Peter Collingbourne dd3486ece1 [dfsan] New calling convention for custom functions with variadic arguments.
Summary:
The previous calling convention prevented custom functions from being able
to access argument labels unless it knew how many variadic arguments there
were, and of which type. This restriction made it impossible to correctly
model functions in the printf family, as it is legal to pass more arguments
than required to those functions. We now pass arguments in the following order:

non-vararg arguments
labels for non-vararg arguments
[if vararg function, pointer to array of labels for vararg arguments]
[if non-void function, pointer to label for return value]
vararg arguments

Differential Revision: http://reviews.llvm.org/D6028

llvm-svn: 220906
2014-10-30 13:22:57 +00:00
Peter Zotov e75657b727 [OCaml] Expose LLVM{Get,Set}DLLStorageClass.
llvm-svn: 220902
2014-10-30 08:30:08 +00:00
Peter Zotov f481471495 [OCaml] Test code emission in Llvm_target.
Prior to this commit, the Llvm_target tests (ab)used
the Llvm_executionengine as a mechanism to initialize at least some
target. This needlessly restricted tests to builds which can emit
code for their host architecture.

llvm-svn: 220901
2014-10-30 08:30:01 +00:00
Peter Zotov d6d49fd0e2 [OCaml] Enable backtraces in tests.
llvm-svn: 220900
2014-10-30 08:29:57 +00:00
Peter Zotov 668f9670a6 [OCaml] [autoconf] Migrate to ocamlfind.
This commit updates the OCaml bindings and tests to use ocamlfind.
The bindings are migrated in order to use ctypes, which are now
required for MCJIT-backed Llvm_executionengine.
The tests are migrated in order to use OUnit and to verify that
the distributed META.llvm allows to build working executables.

Every OCaml toolchain invocation is now chained through ocamlfind,
which (in theory) allows to cross-compile the OCaml bindings.

The configure script now checks for ctypes (>= 0.2.3) and
OUnit (>= 2). The code depending on these libraries will be added
later. The configure script does not check the package versions
in order to keep changes less invasive.

Additionally, OCaml bindings will now be automatically enabled
if ocamlfind is detected on the system, rather than ocamlc, as it
was before.

llvm-svn: 220899
2014-10-30 08:29:45 +00:00
Rafael Espindola 919fb535a0 Enable the slp vectorizer in the gold plugin.
llvm-svn: 220887
2014-10-30 00:38:54 +00:00
Rafael Espindola 8391dbd1c6 Enable the loop vectorizer in the gold plugin.
llvm-svn: 220886
2014-10-30 00:11:24 +00:00
Rafael Espindola 4a3b6cf3c1 Replace also-emit-llvm with save-temps.
The also-emit-llvm option only supported getting the IR before optimizations.
This patch replaces it with a more generic save-temps option that saves the IR
both before and after optimizations.

llvm-svn: 220885
2014-10-29 23:54:45 +00:00
NAKAMURA Takumi 27b6d47f36 llvm/test/Transforms/LoopRotate/nosimplifylatch.ll: Fix possibly mis-repeatedly-pasted test.
llvm-svn: 220880
2014-10-29 23:05:12 +00:00
Yi Jiang 323d57336c Test Case for r220872:Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance
llvm-svn: 220873
2014-10-29 20:20:33 +00:00
Yi Jiang ab19fff4d8 Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance
llvm-svn: 220872
2014-10-29 20:19:47 +00:00
Jan Wen Voung ce2164f45c Fix getRelocationValueString to return the symbol name for EM_386.
Summary: This helps llvm-objdump -r to print out the symbol name along
with the relocation type on x86. Adjust existing tests from checking
for "Unknown" to check for the symbol now.

Test Plan: Adjusted test/Object tests.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5987

llvm-svn: 220866
2014-10-29 18:37:13 +00:00
Robert Khasanov 595e598869 [AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP*, VSUBP*, VMULP*, VDIVP*, VMAXP*, VMINP*)
Refactored through AVX512_maskable
Added encoding tests for them.

llvm-svn: 220858
2014-10-29 15:43:02 +00:00
Peter Zotov f58626d5c9 [OCaml] Expose Llvm_target.TargetMachine.add_analysis_passes.
llvm-svn: 220846
2014-10-29 08:16:14 +00:00
Peter Zotov 5f28729c61 [OCaml] Expose Llvm_bitwriter.write_bitcode_to_memory_buffer.
llvm-svn: 220844
2014-10-29 08:16:01 +00:00
Peter Zotov 662538ac40 [OCaml] Drop support for 3.12.1 and earlier.
In practice this means:
  * Always using -g flag.
  * Embedding -cclib -lstdc++ into the corresponding cma/cmxa file.
    This also moves -lstdc++ in a single place.
  * Using caml_named_value instead of a homegrown mechanism.

llvm-svn: 220843
2014-10-29 08:15:54 +00:00
Peter Zotov e447b61c50 [OCaml] Synchronize transformations with LLVM-C.
Also, rearrange the functions in a way that allows to quickly
compare C headers and .mli/glue files.

llvm-svn: 220842
2014-10-29 08:15:21 +00:00
NAKAMURA Takumi 815d752b93 macho-symbolized-disassembly.test: Don't check C++ demangler unconditionally.
For example, MS PSDK is not expected to have <cxxabi.h>.
You should introduce the new feature in lit.cfg corresponding to HAVE_CXXABI_H if you would like to test demangler.

llvm-svn: 220840
2014-10-29 08:08:21 +00:00
Saleem Abdulrasool 56ade2991b test: tweak inlined-allocs test
Remove pointless checks for storage of uninteresting values.  Ensure that we
perform basic alias analysis to make the test more correct.  Finally, apply a
stylistic change to the test.

llvm-svn: 220839
2014-10-29 06:31:11 +00:00
Kevin Enderby 04bf6931cc Update llvm-objdump’s Mach-O symbolizer code to demangle C++ names.
llvm-svn: 220833
2014-10-28 23:39:46 +00:00
Peter Zotov 1f351ca15a [OCaml] PR5595: Pass LDFLAGS to tests via -cclib.
llvm-svn: 220832
2014-10-28 23:31:13 +00:00
Peter Zotov 796537119b [OCaml] Fix whitespace.
llvm-svn: 220829
2014-10-28 22:39:42 +00:00
Peter Zotov dacfc64b57 [OCaml] PR9719, PR14727: Make tests run without ocamlopt.
Previously, tests hardcoded ocamlopt and cmxa, which broke builds on
machines without ocamlopt. Instead, they now fall back to ocamlc.

As a side effect this fixes PR14727, which was caused by a crude hack
that replaced gcc with g++ everywhere in the ocamlopt native compiler
path and passes it back using -cc. Now the tests use the same
technique as META, i.e. -cclib -lstdc++. It might be more fragile
than using g++ explicitly, but it will break when the installed
package will also break, which is good.

llvm-svn: 220828
2014-10-28 22:39:36 +00:00
Peter Zotov 1afb7497c7 [OCaml] PR19859: Add functions to query and modify branches.
Patch by Gabriel Radanne <drupyog@zoho.com>.

llvm-svn: 220818
2014-10-28 19:47:02 +00:00
Peter Zotov 6074c344de [OCaml] PR19859: Add tests for reading the values of numeric constants.
Patch by Gabriel Radanne <drupyog@zoho.com>.

llvm-svn: 220816
2014-10-28 19:46:52 +00:00
Saleem Abdulrasool d178ada55e Transforms: reapply SVN r219899
This restores the commit from SVN r219899 with an additional change to ensure
that the CodeGen is correct for the case that was identified as being incorrect
(originally PR7272).

In the case that during inlining we need to synthesize a value on the stack
(i.e. for passing a value byval), then any function involving that alloca must
be stripped of its tailness as the restriction that it does not access the
parent's stack no longer holds.  Unfortunately, a single alloca can cause a
rippling effect through out the inlining as the value may be aliased or may be
mutated through an escaped external call.  As such, we simply track if an alloca
has been introduced in the frame during inlining, and strip any tail calls.

llvm-svn: 220811
2014-10-28 18:27:37 +00:00
Robert Khasanov eb12639375 [AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset.
Refactored through AVX512_maskable

llvm-svn: 220806
2014-10-28 18:15:20 +00:00
Robert Khasanov 3e534c93b9 [AVX-512] Expanded rsqrt/rcp instructions to VL subset.
Refactored multiclass through AVX512_maskable

llvm-svn: 220783
2014-10-28 16:37:13 +00:00
Robert Khasanov 4441c4d31b [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.
This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
Added tests for vselect of <X x i1> values.

llvm-svn: 220777
2014-10-28 15:59:40 +00:00
Robert Khasanov dd09a8f320 [AVX512] Bring back vector-shuffle lowering support through broadcasts
Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code.

Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll):

 define   <16 x i32> @_inreg16xi32(i32 %a) {
 ; CHECK-LABEL: _inreg16xi32:
 ; CHECK:       ## BB#0:
-; CHECK-NEXT:    vpbroadcastd %edi, %zmm0
+; CHECK-NEXT:    vmovd %edi, %xmm0
+; CHECK-NEXT:    vpbroadcastd %xmm0, %ymm0
+; CHECK-NEXT:    vinserti64x4 $1, %ymm0, %zmm0, %zmm0
 ; CHECK-NEXT:    retq
 %b = insertelement <16 x i32> undef, i32 %a, i32 0
 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer
 ret <16 x i32> %c
}

Here, 256-bit broadcast was generated instead of 512-bit one.

In this patch
1) I added vector-shuffle lowering through broadcasts
2) Removed asserts and branches likes because this is incorrect
-  assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI");
3) Fixed lowering tests

llvm-svn: 220774
2014-10-28 12:28:51 +00:00
Reid Kleckner 9ccce99e1d X86: Implement the vectorcall calling convention
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.

Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.

On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D5943

llvm-svn: 220745
2014-10-28 01:29:26 +00:00
Tim Northover 00917897b2 AArch64: enable Cortex-A57 FP balancing on Cortex-A53.
Benchmarks have shown that it's harmless to the performance there, and having a
unified set of passes between the two cores where possible helps big.LITTLE
deployment.

Patch by Z. Zheng.

llvm-svn: 220744
2014-10-28 01:24:32 +00:00
Adam Nemet cf7a4a2660 [AVX512] Add vpermil variable version
This is implemented via a multiclass that derives from the vperm imm
multiclass.

Fixes <rdar://problem/18426089>

llvm-svn: 220737
2014-10-27 23:08:40 +00:00
Pete Cooper 7c801dc90b Fix a stackmap bug introduced in r220710.
For a call to not return in to the stackmap shadow, the shadow must end with the call.

To do this, we must insert any required nops *before* the call, and not after it.

llvm-svn: 220728
2014-10-27 22:38:45 +00:00
Juergen Ributzka 7ccebec668 [FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check.
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.

This fixes rdar://problem/18785125.

llvm-svn: 220713
2014-10-27 19:58:36 +00:00
Juergen Ributzka 0190fea941 [FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'.
Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and'
instruction.

This fixes rdar://problem/18784953.

llvm-svn: 220712
2014-10-27 19:46:23 +00:00
Pete Cooper 3c0af35232 Stackmap shadows should consider call returns a branch target.
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.

A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.

llvm-svn: 220710
2014-10-27 19:40:35 +00:00
Juergen Ributzka 90f741a2ce [FastISel][AArch64] Use 'cbz' also for null values (pointers).
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.

This fixes rdar://problem/18784732.

llvm-svn: 220709
2014-10-27 19:38:05 +00:00
Juergen Ributzka eae91040d8 [FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block.
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.

This fixes rdar://problem/18784013.

llvm-svn: 220704
2014-10-27 19:16:48 +00:00
Juergen Ributzka 6de054a25a [FastISel][AArch64] Fix load/store with frame indices.
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.

This fix extends the possible addressing modes for frame indices.

This fixes rdar://problem/18783298.

llvm-svn: 220700
2014-10-27 18:21:58 +00:00
Kostya Serebryany 4f8f0c5aa2 [asan] experimental tracing for indirect calls, llvm part.
llvm-svn: 220699
2014-10-27 18:13:56 +00:00
Oliver Stannard 79efe41a0c [ARM] Select VMAXNM and VMINNM regardless of operand order
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
  (a < b) ? a : b
  (a > b) ? a : b
but not these expressions:
  (a > b) ? b : a
  (a < b) ? b : a

This patch allows all of these expressions to be matched.

llvm-svn: 220671
2014-10-27 09:23:02 +00:00
David Majnemer c8bdd23acf InstCombine: Fix a combine assuming that icmp operands were integers
An icmp may have pointer arguments, it isn't limited to integers or
vectors of integers.

This fixes PR21388.

llvm-svn: 220664
2014-10-27 05:47:49 +00:00
Elena Demikhovsky 4b01b7306c AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction
llvm-svn: 220638
2014-10-26 09:52:24 +00:00
Peter Zotov 8cdf0425a0 [OCaml] hexagon can't run MCJIT tests, XFAIL it.
llvm-svn: 220621
2014-10-25 19:01:14 +00:00
Peter Zotov 3944e6e223 [OCaml] Unbreak Llvm_executionengine.initialize_native_target.
First, return true on success, as it is the OCaml convention.
Second, also initialize the native assembly printer, which is,
despite the name, required for MCJIT operation.

Since this function did not initialize the assembly printer earlier
and no function to initialize native assembly printer was available
elsewhere, it is safe to break its interface: it means that it
simply could not be used successfully before.

llvm-svn: 220620
2014-10-25 18:50:02 +00:00
Peter Zotov d1531a2349 [OCaml] Expose Llvm_executionengine.ExecutionEngine.create_mcjit.
llvm-svn: 220619
2014-10-25 18:49:56 +00:00
Jingyue Wu fe72fcebf6 [SeparateConstOffsetFromGEP] Fixed a bug related to unsigned modulo
The dividend in "signed % unsigned" is treated as unsigned instead of signed,
causing unexpected behavior such as -64 % (uint64_t)24 == 0.

Added a regression test in split-gep.ll

Patched by Hao Liu.

llvm-svn: 220618
2014-10-25 18:34:03 +00:00
Jingyue Wu b723152379 [SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressions
The two operands of the new OR expression should be NextInChain and TheOther
instead of the two original operands.

Added a regression test in split-gep.ll.

Hao Liu reported this bug, and provded the test case and an initial patch.
Thanks! 

llvm-svn: 220615
2014-10-25 17:36:21 +00:00
Jingyue Wu ea51161a94 [NVPTX] aligned byte-buffers for vector return types
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.

Test Plan: test/Codegen/NVPTX/vector-return.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5612

llvm-svn: 220607
2014-10-25 03:46:16 +00:00
Rafael Espindola 6e0b559d4f Add a test for the -suppress-warnings option.
llvm-svn: 220603
2014-10-25 01:14:15 +00:00
Evgeniy Stepanov d337a59db5 [msan] Make -msan-check-constant-shadow a bit stronger.
Allow (under the experimental flag) non-Instructions to participate in MSan checks.

llvm-svn: 220601
2014-10-24 23:34:15 +00:00
Kevin Enderby 2813f496d9 Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol.
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section.  But when either symbol is undefined it
is an error.

The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.

rdar://18678402

llvm-svn: 220599
2014-10-24 22:39:40 +00:00
Simon Pilgrim fd080af0c5 [X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.

An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.

Differential Revision: http://reviews.llvm.org/D5917

llvm-svn: 220592
2014-10-24 21:04:41 +00:00
Colin LeMahieu 838307b31f [Hexagon] Resubmission of 220427
Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst.
Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.

http://reviews.llvm.org/D5624

llvm-svn: 220584
2014-10-24 19:00:32 +00:00
Sanjay Patel f924e11967 Allow AVX vrsqrtps generation.
This is a follow-on to r220570 that allows a 256-bit (v8f32)
version of vrsqrtps to be generated.

llvm-svn: 220579
2014-10-24 17:59:18 +00:00
Sanjay Patel 957efc23bb Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Daniel Sanders 19f01658fe [mips] For N32/N64, structs must be passed in the upper bits of a register.
Summary:
Most structs were fixed by r218451 but those of between >32-bits and
<64-bits remained broken since they were not marked with [ASZ]ExtUpper.
This patch fixes the remaining cases by using
CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5963

llvm-svn: 220556
2014-10-24 13:09:19 +00:00
Oliver Stannard f7a5afc3f2 [AArch64] Fix fast-isel of cbz of i1, i8, i16
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.

llvm-svn: 220553
2014-10-24 09:54:41 +00:00
Timur Iskhodzhanov b320527e39 Update test/MC/ARM/coff-debugging-secrel.ll expectations to fix breakage caused by r220544
llvm-svn: 220548
2014-10-24 06:24:07 +00:00
Timur Iskhodzhanov 2bc90fdbdc Fix PR21189 -- Emit symbol subsection required to debug LLVM-built binaries with VS2012+
Reviewed at http://reviews.llvm.org/D5772

llvm-svn: 220544
2014-10-24 01:27:45 +00:00
Ahmed Bougacha 26f63f77cd Make test for r220533 more robust by using GPR pattern.
llvm-svn: 220541
2014-10-24 00:03:46 +00:00
Adam Nemet 832ec5e911 [AVX512] FMA support for the 231 variants
This is asm/diasm-only support, similar to AVX.

For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.

For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant.  The addition operand (op3) in both cases can come from
memory.  Again the ony difference is which operand is destructed.

There could be a post-RA pass that would convert a 213 or 132 into a 231.

Part of <rdar://problem/17082571>

llvm-svn: 220540
2014-10-24 00:03:00 +00:00
Ahmed Bougacha 7daf3b89f9 [SelectionDAG] Teach the vector scalarizer about FP conversions.
This adds support for legalization of instructions of the form:

  [fp_conv] <1 x i1> %op to <1 x double>

where fp_conv is one of fpto[us]i, [us]itofp.  This used to assert
because they were simply missing from the vector operand scalarizer.

A similar problem arose in r190830, with trunc instead.

Fixes PR20778.

Differential Revision: http://reviews.llvm.org/D5810

llvm-svn: 220533
2014-10-23 22:49:25 +00:00
Tim Northover e4c7be56bf ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.

The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.

Should fix PR20376.

llvm-svn: 220529
2014-10-23 22:31:48 +00:00
David Blaikie 1dd573db45 DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle argument ordering of other arguments (abstract arguments) in the same way and already have code for that too.
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.

The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.

(though, let's see what the buildbots have to say about this - *crosses
fingers*)

There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.

llvm-svn: 220527
2014-10-23 22:27:50 +00:00
Timur Iskhodzhanov 56af52f852 PR21189: Teach llvm-readobj to dump bits of COFF symbol subsections required to debug using VS2012+
Reviewed at http://reviews.llvm.org/D5755
Thanks to Andrey Guskov for his help investigating this!

llvm-svn: 220526
2014-10-23 22:25:31 +00:00
Ahmed Bougacha 5175bcf43a [X86] Improve mul w/ overflow codegen, to MUL8+SETO.
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.

This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.

i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.

Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.

A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):

  // FIXME: Used for 8-bit mul, ignore result upper 8 bits.
  // This probably ought to be moved to a def : Pat<> if the
  // syntax can be accepted.
  [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]

Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.

Before this change:
  movsbl  %sil, %eax
  movsbl  %dil, %ecx
  imull   %eax, %ecx
  movb    %cl, %al
  sarb    $7, %al
  movzbl  %al, %eax
  movzbl  %ch, %esi
  cmpl    %eax, %esi
  setne   %al

After:
  movb    %dil, %al
  imulb   %sil
  seto    %al

Also, remove a made-redundant testcase for PR19858, and enable more FastISel
ALU-overflow tests for SelectionDAG too.

Differential Revision: http://reviews.llvm.org/D5809

llvm-svn: 220516
2014-10-23 21:55:31 +00:00
Sanjay Patel 848309da7c Handle sqrt() shrinking in SimplifyLibCalls like any other call
This patch removes a chunk of special case logic for folding 
(float)sqrt((double)x) -> sqrtf(x)
in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls.

No functional change intended, but I loosened the restriction on the existing
sqrt testcases to allow for this optimization even without unsafe-fp-math because
that's the existing behavior.

I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic
in case the result is used as a double.

Differential Revision: http://reviews.llvm.org/D5919

llvm-svn: 220514
2014-10-23 21:52:45 +00:00
Peter Collingbourne dfc98c2e61 Make llvm-go test dependency optional.
llvm-svn: 220503
2014-10-23 19:51:40 +00:00
Kevin Enderby 6f326ce75b Update llvm-objdump’s Mach-O symbolizer code for Objective-C references.
This prints disassembly comments for Objective-C references to CFStrings,
Selectors, Classes and method calls.

llvm-svn: 220500
2014-10-23 19:37:31 +00:00
Rafael Espindola 52b249b9f4 Cleanup this test a bit.
Use simpler names and remove unnecessary fields.

llvm-svn: 220499
2014-10-23 19:36:21 +00:00
Rafael Espindola 2c301f5c8c Cleanup this test a bit.
Use simpler names and remove unnecessary fields.

llvm-svn: 220498
2014-10-23 19:23:42 +00:00
David Blaikie 49cfc8ca8e DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.
This fixes a bug (introduced by fixing the IR emitted from Clang where
the definition of a static member would be scoped within the class,
rather than within its lexical decl context) where the definition of a
static variable would be placed inside a class.

It also improves source fidelity by scoping static class member
definitions inside the lexical decl context in which tehy are written
(eg: namespace n { class foo { static int i; } int foo::i; } - the
definition of 'i' will be within the namespace 'n' in the DWARF output
now).

Lastly, and the original goal, this reduces debug info size slightly
(and makes debug info easier to read, etc) by placing the definitions of
non-member global variables within their namespace, rather than using a
separate namespace-scoped declaration along with a definition at global
scope.

Based on patches and discussion with Frédéric.

llvm-svn: 220497
2014-10-23 19:12:43 +00:00
Rafael Espindola 28d6b27bba Make this test a bit stricter.
This now:
* Forces the linker to include the internal definition.
* Checks the full output.

llvm-svn: 220495
2014-10-23 18:52:46 +00:00
Rafael Espindola c16ec3ed42 Make this test a bit stricter.
This now:
* Forces the linker to include the internal definition.
* Checks the full output.

llvm-svn: 220494
2014-10-23 18:44:07 +00:00
Reid Kleckner 5b2787dfb2 Revert "Don't count inreg params when mangling fastcall functions"
This reverts commit r214981.

I'm not sure what I was thinking when I wrote this. Testing with MSVC
shows that this function is mangled to '@f@8':
  int __fastcall f(int a, int b);

llvm-svn: 220492
2014-10-23 17:50:42 +00:00
Renato Golin 6fb9c2ea70 Do not emit intermediate register for zero FP immediate
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:

vmov.i32        d16, #0x0
vcmpe.f64       d0, d16

With this change it becomes:

vcmpe.f64        d0, #0

Patch by Sergey Dmitrouk.

llvm-svn: 220486
2014-10-23 15:31:50 +00:00
NAKAMURA Takumi 504bbf91cd Revert r220427, "[Hexagon] Adding encoding bits for add opcode."
It brought cyclic dependecy between HexagonAsmPrinter and HexagonDesc.

llvm-svn: 220478
2014-10-23 11:31:22 +00:00
Zoran Jovanovic 42b8444372 [mips][microMIPS] Implement ADDIUR1SP instruction
Differential Revision: http://reviews.llvm.org/D5153

llvm-svn: 220477
2014-10-23 11:13:59 +00:00
Zoran Jovanovic bac3619b29 ps][microMIPS] Implement ADDIUR2 instruction
Differential Revision: http://reviews.llvm.org/D5151

llvm-svn: 220476
2014-10-23 11:06:34 +00:00
Zoran Jovanovic 9bda2f1926 ps][microMIPS] Implement LI16 instruction
Differential Revision: http://reviews.llvm.org/D5149

llvm-svn: 220475
2014-10-23 10:59:24 +00:00
Zoran Jovanovic 4a00fdc2e3 [mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions
Differential Revision: http://reviews.llvm.org/D5774

llvm-svn: 220474
2014-10-23 10:42:01 +00:00
Oliver Stannard 39a85abddf [Thumb2] Improve disassembly of memory hints
Currently, the ARM disassembler will disassemble the Thumb2 memory hint
instructions (PLD, PLDW and PLI), even for targets which do not have
these instructions. This patch adds the required checks to the
disassmebler.

llvm-svn: 220472
2014-10-23 08:52:58 +00:00
Akira Hatanaka 2ee0e9e6ee [ARM, stack protector] If supported, use armv7 instructions.
This commit enables using movt/movw to load the stack guard address:

movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
ldr r0, [pc, r0]

Previously a pc-relative load was emitted:

ldr r0, LCPI0_0
ldr r0, [pc, r0]

rdar://problem/18740489

llvm-svn: 220470
2014-10-23 04:17:05 +00:00
Frederic Riss e939b43aa4 [dwarfdump] Dump DW_AT_ranges values inline in the debug_info dump.
The output looks like that:
                      DW_AT_ranges [FORM_data4]    (0x00000000
                         [0x00000001000024a0 - 0x00000001000024c2)
                         [0x0000000100002505 - 0x000000010000268b))

Differential Revision: http://reviews.llvm.org/D5712

llvm-svn: 220466
2014-10-23 04:08:34 +00:00
Peter Collingbourne 244ecf55bd Add llvm-go tool.
This tool lets us build LLVM components within the tree by setting up a
$GOPATH that resembles a tree fetched in the normal way with "go get".

It is intended that components such as the Go frontend will be built in-tree
using this tool.

Differential Revision: http://reviews.llvm.org/D5902

llvm-svn: 220462
2014-10-23 02:33:23 +00:00
Derek Schuff 1fd051bfe8 Fix Mips nacl-mask test for new bundle-aligned label behavior
After r220439 the behavior of labels in bundle-align mode changed,
and I neglected to update this test.

llvm-svn: 220447
2014-10-22 23:32:00 +00:00
Derek Schuff 5f708e5ec8 [MC] Attach labels to existing fragments instead of using a separate fragment
Summary:
Currently when emitting a label, a new data fragment is created for it if the
current fragment isn't a data fragment.
This change instead enqueues the label and attaches it to the next fragment
(e.g. created for the next instruction) if possible.

When bundle alignment is not enabled, this has no functionality change (it
just results in fewer extra fragments being created). For bundle alignment,
previously labels would point to the beginning of the bundle padding instead
of the beginning of the emitted instruction. This was not only less efficient
(e.g. jumping to the nops instead of past them) but also led to miscalculation
of the address of the GOT (since MC uses a label difference rather than
emitting a "." symbol).

Fixes https://code.google.com/p/nativeclient/issues/detail?id=3982

Test Plan: regression test attached

Reviewers: jvoung, eliben

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D5915

llvm-svn: 220439
2014-10-22 22:38:06 +00:00
Colin LeMahieu 73a51a1a68 [Hexagon] Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.

http://reviews.llvm.org/D5624

llvm-svn: 220427
2014-10-22 20:58:35 +00:00
Chad Rosier dcd2a3014c [AArch64] Add support for the .inst directive.
This has been implement using the MCTargetStreamer interface as is done in the
ARM, Mips and PPC backends.

Phabricator: http://reviews.llvm.org/D5891
PR20964

llvm-svn: 220422
2014-10-22 20:35:57 +00:00
Justin Bogner 72d1f2b61b test: Make this test runnable in directories with @ in their names
Jenkins likes to use directories with names involving the '@'
character, which breaks the sed expression in this test. Switch to use
'|' on the assumption that it's less likely to show up in a path.

llvm-svn: 220401
2014-10-22 18:18:54 +00:00
Bill Schmidt 9c54bbd791 [PATCH] Support select-cc for VSFRC when VSX is enabled
A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
handle <2 x double> cases.  This patch adds SELECT_VSFRC and
SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
f64 type when VSX is enabled.  The changes are analogous to those in
the previous patch.  I've added a new variant to vsx.ll to test the
code generation.

(I also cleaned up a little formatting in PPCInstrVSX.td from the
previous patch.)

llvm-svn: 220395
2014-10-22 16:58:20 +00:00
Sanjay Patel a92fa44740 Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850)
When a call to a double-precision libm function has fast-math semantics 
(via function attribute for now because there is no IR-level FMF on calls), 
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.

We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.

I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.

Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.

This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850

Differential Revision: http://reviews.llvm.org/D5893

llvm-svn: 220390
2014-10-22 15:29:23 +00:00
Diego Novillo a67c0b43e1 Change error to warning when a profile cannot be found.
When the profile for a function cannot be applied, we use to emit an
error. This seems extreme. The compiler can continue, it's just that the
optimization opportunities won't include profile information.

llvm-svn: 220386
2014-10-22 13:36:35 +00:00
Diego Novillo 8027b80b41 Support using sample profiles with partial debug info.
Summary:
When using a profile, we used to require the use -gmlt so that we could
get access to the line locations. This is used to match line numbers in
the input profile to the line numbers in the function's IR.

But this is actually not necessary. The driver can provide source
location tracking without the emission of debug information. In these
cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the
actual line location annotations are still present.

This patch adds a new way of looking for the start of the current
function. Instead of looking through the compile units in llvm.dbg.cu,
we can walk up the scope for the first instruction in the function with
a debug loc. If that describes the function, we use it. Otherwise, we
keep looking until we find one.

If no such instruction is found, we then give up and produce an error.

Reviewers: echristo, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5887

llvm-svn: 220382
2014-10-22 12:59:00 +00:00
Arnaud A. de Grandmaison 9b3330546b [AArch64] Cleanup A57PBQPConstraints
And add a long awaited testcase.

llvm-svn: 220381
2014-10-22 12:40:20 +00:00
Bruno Cardoso Lopes c29520c5b3 [InstSimplify] Support constant folding to vector of pointers
ConstantFolding crashes when trying to InstSimplify the following load:

@a = private unnamed_addr constant %mst {
     i8* inttoptr (i64 -1 to i8*),
     i8* inttoptr (i64 -1 to i8*)
}, align 8

%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8

This patch fix this by adding support to this type of folding:

%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8
==> gets folded to:
  %x = <2 x i8*> <i8* inttoptr (i64 -1 to i8*), i8* inttoptr (i64 -1 to i8*)>

llvm-svn: 220380
2014-10-22 12:18:48 +00:00
Jyoti Allur 3b68607eac [Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode
llvm-svn: 220379
2014-10-22 10:41:14 +00:00
Matt Arsenault 0cf39569bf R600/SI: Add another failing testcase for i1 copies
It's not handling phis.

llvm-svn: 220371
2014-10-22 05:30:42 +00:00
Matt Arsenault 59102d38fb R600/SI: Add failing testcase reduced from OpenCV
This fails the verifier with:
"Expected a VCSrc_32 register, but got a VReg_1 register"

llvm-svn: 220368
2014-10-22 04:26:10 +00:00
Rafael Espindola 68bae2c7f6 Handle spaces and quotes in file names in MRI scripts.
llvm-svn: 220364
2014-10-22 03:10:56 +00:00
Hans Wennborg 0b39fc0d16 Revert "Teach the load analysis to allow finding available values which require" (r220277)
This seems to have caused PR21330.

llvm-svn: 220349
2014-10-21 23:49:52 +00:00
Lang Hames 41d95947cf [MCJIT] Defer application of AArch64 MachO GOT relocations until resolve time.
On AArch64, GOT references are page relative (ADRP + LDR), so they can't be
applied until we know exactly where, within a page, the GOT entry will be in
the target address space.

Fixes <rdar://problem/18693976>.

llvm-svn: 220347
2014-10-21 23:41:15 +00:00
Rafael Espindola 915fbb3590 MRI scripts: Add addlib support.
llvm-svn: 220346
2014-10-21 23:18:51 +00:00
Matt Arsenault 7c93690be0 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Matt Arsenault d6511b49ac Add minnum / maxnum intrinsics
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.

llvm-svn: 220341
2014-10-21 23:00:20 +00:00
Matt Arsenault 75c658e2cc R600/SI: Add missing parameter to div_fmas intrinsic
llvm-svn: 220338
2014-10-21 22:20:55 +00:00
Rafael Espindola 8a4635224b Overwrite instead of adding to archives when creating them in mri scripts.
This matches the behavior of GNU ar and also makes it easier to implemnt
support for the addlib command.

llvm-svn: 220336
2014-10-21 21:56:47 +00:00
Matt Arsenault 8c4fb7cae0 R600: Use default GlobalDirective
The overridden one wasn't inserting a space,
so you would end up with .globalfoo

llvm-svn: 220329
2014-10-21 21:08:36 +00:00
Arnaud A. de Grandmaison a61262f989 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
Arnaud A. de Grandmaison ece7fe0e16 [PBQP] Add a testcase for r220302: Fix coalescing benefits
llvm-svn: 220316
2014-10-21 20:10:21 +00:00
David Majnemer d205602a0b InstCombine: Simplify FoldICmpCstShrCst
This function was complicated by the fact that it tried to perform
canonicalizations that were already preformed by InstSimplify.  Remove
this extra code and move the tests over to InstSimplify.  Add asserts to
make sure our preconditions hold before we make any assumptions.

llvm-svn: 220314
2014-10-21 19:51:55 +00:00
Rafael Espindola f03ae4efa7 Drop support for an old version of ld64 (from darwin 9).
llvm-svn: 220310
2014-10-21 18:31:09 +00:00
Rafael Espindola 4bbdeda8be Convert two tests to use llvm-readobj.
llvm-svn: 220308
2014-10-21 18:24:31 +00:00
Matt Arsenault e306a32325 R600/SI: Add pattern for bswap
llvm-svn: 220304
2014-10-21 16:25:08 +00:00
Rafael Espindola c9b33ff9ba Add support for addmod to mri scripts.
llvm-svn: 220294
2014-10-21 14:46:17 +00:00
Bill Schmidt 5c6cb813b6 [PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg
With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in
the FMA mutation pass.  If we have a situation where a killed product
register is the same register as the FMA target, such as:

   %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5,
                       %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 

then the substitution makes no sense.  We end up getting a crash when
we try to extend the interval associated with the killed product
register, as there is already a live range for %vreg5 there.  This
patch just disables the mutation under those circumstances.

Since recipest.ll generates different code with VMX enabled, I've
modified that test to use -mattr=-vsx.  I've borrowed the code from
that test that exposed the bug and placed it in fma-mutate.ll, where
it tests several mutation opportunities including the "bad" one.

llvm-svn: 220290
2014-10-21 13:02:37 +00:00
Oliver Stannard cdb8db8d3c [ARM] NEON 32-bit scalar moves are also available in VFPv2
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.

Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.

llvm-svn: 220288
2014-10-21 11:49:14 +00:00
Yuri Gorshenin 171eb8dbeb [asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an index register.
Summary: Fixed memory accesses with rbp as a base or an index register.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5819

llvm-svn: 220283
2014-10-21 10:22:27 +00:00
Oliver Stannard 38e6d45a46 [Thumb2] LDRS?[BH] cannot load to the PC
The Thumb2 LDRS?[BH] instructions are not valid when the destination
register is the PC (these encodings are used for preload hints).

llvm-svn: 220278
2014-10-21 09:14:15 +00:00
Chandler Carruth aa72a6dd3b Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 220277
2014-10-21 09:00:40 +00:00
Zoran Jovanovic 592239d498 [mips][microMIPS] Implement ADDU16 and SUBU16 instructions
Differential Revision: http://reviews.llvm.org/D5118

llvm-svn: 220276
2014-10-21 08:44:58 +00:00
Zoran Jovanovic 81ceebc56e [mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructions
Differential Revision: http://reviews.llvm.org/D5117

llvm-svn: 220275
2014-10-21 08:32:40 +00:00
Rafael Espindola c606bfe660 Fix a bit of confusion about .set and produce more readable assembly.
Every target we support has support for assembly that looks like

a = b - c
.long a

What is special about MachO is that the above combination suppresses the
production of a relocation.

With this change we avoid producing the intermediary labels when they don't
add any value.

llvm-svn: 220256
2014-10-21 01:17:30 +00:00
Paul Robinson f60e0a160f Do not attribute static allocas to the call site's DebugLoc.
When functions are inlined, instructions without debug information are
attributed to the call site's DebugLoc. After inlining, inlined static
allocas are moved to the caller's entry block, adjacent to the caller's
original static alloca instructions. By retaining the call site's
DebugLoc, these instructions could cause instructions that were
subsequently inserted at the entry block to pick up the same DebugLoc.

Patch by Wolfgang Pieb!

llvm-svn: 220255
2014-10-21 01:00:55 +00:00
Rafael Espindola f16a66973c Make this test a bit more strict.
llvm-svn: 220253
2014-10-21 00:47:49 +00:00
Chandler Carruth 97192421e1 Teach lit to filter the host LDFLAGS down from the build system and into
the CGO build environment. This lets things like -rpath propagate down
to the C++ code that is built along side the Go bindings when testing
them.

Patch by Peter Collingbourne, and verified that it works by me.

llvm-svn: 220252
2014-10-21 00:36:28 +00:00
Lang Hames 2d0d096bd1 [MCJIT] Temporarily revert r220245 - it broke several bots.
(See e.g. http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/17653)

llvm-svn: 220249
2014-10-21 00:24:02 +00:00
Philip Reames bf9676f7f0 Extend the verifier to validate range metadata on calls and invokes.
Range metadata applies to loads, call, and invokes.  We were validating that metadata applied to loads was correct according to the LangRef, but we were not validating metadata applied to calls or invokes.  This change extracts the checking functionality to a common location, reuses it for all valid locations, and adds a simple test to ensure a misused range on a call gets reported.

llvm-svn: 220246
2014-10-20 23:52:07 +00:00
Lang Hames 84801c217c [MCJIT] Make MCJIT honor symbol visibility settings when populating the global
symbol table.

Patch by Anthony Pesch. Thanks Anthony!

llvm-svn: 220245
2014-10-20 23:39:54 +00:00
Quentin Colombet 06355199f1 [X86] Fix a bug in the lowering of the mask of VSELECT.
X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT
when it knows it can be lowered into BLEND. Indeed, only the high bits need to be
set for those and it optimizes those accordingly.
However, when the mask is a compile time constant, the lowering will be handled
by the generic optimizer and those modifications will generate bad code in the
generic optimizer.

This patch fixes that by preventing the optimization if the VSELECT will be
handled by the generic optimizer.

<rdar://problem/18675020>

llvm-svn: 220242
2014-10-20 23:13:30 +00:00
Philip Reames cdb72f369f Introduce a 'nonnull' metadata on Load instructions.
The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns.  Long term, it would be nice to combine these into a single construct.   The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull.

Reviewed by: Hal Finkel
Differential Revision:  http://reviews.llvm.org/D5220

llvm-svn: 220240
2014-10-20 22:40:55 +00:00
Simon Pilgrim 2f9548a3ef [X86] Memory folding for commutative instructions (updated)
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt.

Added additional regression test case provided by Joerg Sonnenberger.

Differential Revision: http://reviews.llvm.org/D5818

llvm-svn: 220239
2014-10-20 22:14:22 +00:00
Tim Northover 23075ccee7 ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

llvm-svn: 220236
2014-10-20 21:28:41 +00:00
Gerolf Hoflehner c47baed0b5 [AArch64] test case for compfail fixed by r219748
llvm-svn: 220206
2014-10-20 16:08:33 +00:00
Oliver Stannard 6672f37ed2 [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M
These instructions are related to the v7[AR] exception model, and are
not defined on v7M.

llvm-svn: 220204
2014-10-20 15:37:35 +00:00
Oliver Stannard e8f63a54b4 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396

llvm-svn: 220196
2014-10-20 11:30:35 +00:00
Oliver Stannard fce039240a [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.

llvm-svn: 220194
2014-10-20 11:00:18 +00:00
Chandler Carruth a32038b006 Fix a miscompile introduced in r220178.
The original code had an implicit assumption that if the test for
allocas or globals was reached, the two pointers were not equal. With my
changes to make the pointer analysis more powerful here, I also had to
guard against circumstances where the results weren't useful. That in
turn violated the assumption and gave rise to a circumstance in which we
could have a store with both the queried pointer and stored pointer
rooted at *the same* alloca. Clearly, we cannot ignore such a store.
There are other things we might do in this code to better handle the
case of both pointers ending up at the same alloca or global, but it
seems best to at least make the test explicit in what it intends to
check.

I've added tests for both the alloca and global case here.

llvm-svn: 220190
2014-10-20 10:03:01 +00:00
Chandler Carruth 6665d62117 Fix a somewhat subtle pair of issues with JumpThreading I introduced in
r220178. First, the creation routine doesn't insert prior to the
terminator of the basic block provided, but really at the end of the
basic block. Instead, get the terminator and insert before that. The
next issue was that we need to ensure multiple PHI node entries for
a single predecessor re-use the same cast instruction rather than
creating new ones.

All of the logic here was without tests previously. I've reduced and
added a test case from the test suite that crashed without both of these
fixes.

llvm-svn: 220186
2014-10-20 05:34:36 +00:00
Chandler Carruth eeec35ae1c Teach the load analysis driving core instcombine logic and other bits of
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.

I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.

This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.

llvm-svn: 220178
2014-10-20 00:24:14 +00:00
Chandler Carruth b5f4c32830 Add a datalayout string to this test so that it exercises the full gamut
of InstCombine rather than just the bits enabled when datalayout is
optional.

The primary fixes here are because now things are little endian.

In good news, silliness like this seems like it will be going away as
we've got pretty stong consensus on dropping optional datalayout
entirely.

llvm-svn: 220176
2014-10-20 00:11:31 +00:00
Bill Schmidt 941a3244ff [PowerPC] Clean up -mattr=+vsx tests to always specify -mcpu
We recently discovered an issue that reinforces what a good idea it is
to always specify -mcpu in our code generation tests, particularly for
-mattr=+vsx.  This patch ensures that all tests that specify
-mattr=+vsx also specify -mcpu=pwr7 or -mcpu=pwr8, as appropriate.

Some of the uses of -mattr=+vsx added recently don't make much sense
(when specified for -mtriple=powerpc-apple-darwin8 or -march=ppc32,
for example).  For cases like this I've just removed the extra VSX
test commands; there's enough coverage without them.

llvm-svn: 220173
2014-10-19 21:29:21 +00:00
Bill Schmidt 87982a1e9b [PowerPC] Temporarily disable VSX for PowerPC fast-isel tests
Patch by Bill Seurer; some comment formatting changes by me.

There are a few PowerPC test cases for FastISel support that currently
fail with VSX support enabled.  The temporary workaround under
discussion in http://reviews.llvm.org/D5362 helps, but the tests still
fail because they specify -fast-isel-abort, and the VSX workaround
punts back to SelectionDAG.  We have plans to fix FastISel permanently
for VSX, but until that's in place these tests are preventing us from
enabling VSX by default.  Therefore we are adding -mattr=-vsx to these
tests until the full support is ready.

llvm-svn: 220172
2014-10-19 20:48:47 +00:00
Bill Schmidt fff860979d [PowerPC] Re-enable VSX test line for fma.ll with -mcpu=pwr7
The VSX testing variant in test/CodeGen/PowerPC/fma.ll had to be
disabled because of unexpected behavior on many of the builders.  I
tracked this down to a situation that occurs when the VSX attribute is
enabled for a target that disables the MI early scheduling pass.  This
patch adds -mcpu=pwr7 to make this predictable.  The other issue will
be addressed separately.

llvm-svn: 220171
2014-10-19 20:27:56 +00:00
Chandler Carruth bc6378defb Do a better and more complete job of preserving metadata when combining
loads.

This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.

I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[

llvm-svn: 220165
2014-10-19 10:46:46 +00:00
Chandler Carruth 5b8cd2f73c Move previously dead code to handle computing the known bits of an alias
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.

This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.

llvm-svn: 220164
2014-10-19 09:06:56 +00:00
David Majnemer 312c3e5f39 InstCombine: (sub (or A B) (xor A B)) --> (and A B)
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).

Patch by Ankur Garg!

Differential Revision: http://reviews.llvm.org/D5719

llvm-svn: 220163
2014-10-19 08:32:32 +00:00
David Majnemer 59939acd26 InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1

Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))

This handles only the equality operators for now. Other operators need
to be handled.

Patch by Ankur Garg!

llvm-svn: 220162
2014-10-19 08:23:08 +00:00
Chandler Carruth a801dd5799 Fix a long-standing miscompile in the load analysis that was uncovered
by my refactoring of this code.

The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.

This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.

llvm-svn: 220161
2014-10-19 08:17:50 +00:00
Chandler Carruth be9dccd64d Preserve AA metadata when combining (cast (load (...))) -> (load (cast
(...))).

llvm-svn: 220141
2014-10-18 11:00:12 +00:00
Chandler Carruth 2f75fcfef3 [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load
...)) and (load (cast ...)): canonicalize toward the former.

Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.

Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.

There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.

I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.

The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.

llvm-svn: 220138
2014-10-18 06:36:22 +00:00
Chandler Carruth 71009cad95 Remove a test that was ported from the old llvm-gcc frontend test suite.
This test is pretty awesome. It is claiming to test devirtualization.
However, the code in question is not in fact devirtualized by LLVM. If
you take the original C++ test case and run it through Clang at -O3 we
fail to devirtualize it completely. It also isn't a sufficiently focused
test case.

The *reason* we fail to devirtualize it isn't because of any missing
instcombine though. Instead, it is because we fail to emit an available
externally vtable and thus the vtable is just an external and completely
opaque. If I cause the vtable to be emitted, we successfully
devirtualize things.

Anyways, I'm just removing it because it is providing negative value at
this point: it isn't representative of the output of Clang really, LLVM
isn't doing the transform it claims to be testing, LLVM's failure to do
the transform isn't actually an LLVM bug at all and we shouldn't be
testing for it here, and finally the test is written in such a way that
it will trivially pass even when the point of the test is failing.

llvm-svn: 220137
2014-10-18 06:36:18 +00:00
Nick Kledzik 4e1ef9951d [llvm-objdump] don't test timestamp dump as that is time zone dependent
llvm-svn: 220123
2014-10-18 02:28:01 +00:00
Nick Kledzik 600f245b8a [llvm-objdump] enhance test case for mach-o -private-headers
llvm-svn: 220120
2014-10-18 01:50:55 +00:00
Nick Kledzik 3b2aa057e6 [llvm-objdump] Fix mach-o binding decompression error
llvm-svn: 220119
2014-10-18 01:21:02 +00:00
Chandler Carruth 2dc9682e59 [SROA] Change how SROA does vector-based promotion of allocas to handle
cases where the alloca type, the load types, and the store types used
all disagree.

Previously, the only way that vector-based promotion occured was if the
alloca type was a vector type. This was one of the *very* few remaining
uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns
out it was a bad idea.

The alloca type can change very easily based on the mixture of types
loaded and stored to that alloca. We shouldn't be relying on it as
a signal for very much. Instead, the source of truth should be loads and
stores. We should canonicalize the loads and stores as much as possible
and then rely on them exclusively in SROA.

When looking and loads and stores, we may find many different candidate
vector types. This change will let SROA try all of them to find a vector
type which is a viable way to promote the entire alloca to a vector
register.

With this change, it becomes possible to do better canonicalization and
optimization of loads and stores without breaking SROA in random ways,
and that should allow fixing a core source of performance loss in hot
numerical loops such as those in Eigen.

llvm-svn: 220116
2014-10-18 00:44:02 +00:00
Aaron Watry 8114437a8f R600/SI: Add global atomicrmw xchg
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220110
2014-10-17 23:33:03 +00:00
Aaron Watry d672ee2a47 R600/SI: Add global atomicrmw xor
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220109
2014-10-17 23:33:01 +00:00
Aaron Watry 8a911e6926 R600/SI: Add global atomicrmw or
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220108
2014-10-17 23:32:59 +00:00
Aaron Watry 58c9992f15 R600/SI: Add global atomicrmw min/umin
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220107
2014-10-17 23:32:57 +00:00
Aaron Watry 29f295d7a5 R600/SI: Add global atomicrmw max/umax
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220106
2014-10-17 23:32:56 +00:00
Aaron Watry 621278034c R600/SI: Add global atomicrmw and
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220105
2014-10-17 23:32:54 +00:00
Aaron Watry 328f1bae8e R600/SI: Add global atomicrmw sub
v2: Add separate offset/no-offset tests

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220104
2014-10-17 23:32:52 +00:00
Aaron Watry 28682cf205 R600/SI: Fix/add tests for atomicrmw add
The previous tests claimed to test constant offsets in the function name,
but the tests weren't actually testing them.

Clone the tests, and do testing of all combinations of the following:
1) with/without constant pointer offset
2) 32/64-bit addressing modes
3) Usage and non-usage of the return value from the atomicrmw

Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220103
2014-10-17 23:32:50 +00:00
Aaron Watry 1d13d36520 R600: Rename atomic_load global tests to atomic_add
The function name now matches what it's actually testing.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220102
2014-10-17 23:32:49 +00:00
Evgeniy Stepanov e08633e900 [msan] Fix handling of byval arguments with large alignment.
MSan param-tls slots are 8-byte aligned. This change clips
alignment of memcpy into param-tls to 8.

llvm-svn: 220101
2014-10-17 23:29:44 +00:00
Pete Cooper 230332f4fe Check for dynamic alloca's when selecting lifetime intrinsics.
TL;DR: Indexing maps with [] creates missing entries.

The long version:

When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime.  Trouble is, we don't first check to see if this is a dynamic alloca.

On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime.  0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots.

PEI would later trigger an assert because it expects the stack protector to not be dead.

This fix ensures that we only get frame indices for static allocas, ie, those in the map.  Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken.

rdar://problem/18672951

llvm-svn: 220099
2014-10-17 22:59:33 +00:00
Bill Schmidt 296bb31f64 [PowerPC] Disable +vsx RUN line for fma.ll due to inconsistency on other builders
llvm-svn: 220094
2014-10-17 21:32:22 +00:00
Rafael Espindola 7da1ea83a9 Revert "TRE: make TRE a bit more aggressive"
This reverts commit r219899.

This also updates byval-tail-call.ll to make it clear what was breaking.
Adding r219899 again will cause the load/store to disappear.

llvm-svn: 220093
2014-10-17 21:25:48 +00:00
Bill Schmidt a087d74250 [PowerPC] Change liveness testing in VSX FMA mutation pass
With VSX enabled, LLVM crashes when compiling
test/CodeGen/PowerPC/fma.ll.  I traced this to the liveness test
that's revised in this patch. The interval test is designed to only
work for virtual registers, but in this case the AddendSrcReg is
physical. Since there is already a walk of the MIs between the
AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg
in that loop.  At Hal Finkel's request, I converted the liveness test
to an assert restricted to virtual registers.

I've changed the fma.ll test to have VSX and non-VSX variants so we
can test both kinds of multiply-adds.

llvm-svn: 220090
2014-10-17 21:02:44 +00:00
Peter Collingbourne ce80084446 Disable ccache for go tests.
Should fix llvm-clang-lld-x86_64-debian-fast bot.

llvm-svn: 220071
2014-10-17 18:32:36 +00:00
Matt Arsenault d282ada508 R600/SI: Allow commuting with source modifiers
llvm-svn: 220066
2014-10-17 18:00:48 +00:00
Matt Arsenault 6d3cd544bb R600/SI: Allow comuting fp immediates
llvm-svn: 220062
2014-10-17 18:00:39 +00:00
Peter Collingbourne fde87a0cdf We also need to catch OSError here.
llvm-svn: 220058
2014-10-17 17:46:46 +00:00
Matt Arsenault 83a535ff6b R600/SI: Remove SI_BUFFER_RSRC pseudo
Just use REG_SEQUENCE directly, so there are fewer
instructions to need to deal with later.

llvm-svn: 220056
2014-10-17 17:42:56 +00:00
Juergen Ributzka ad2363f9ee [Stackmaps] Enable invoking the patchpoint intrinsic.
Patch by Kevin Modzelewski
Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits, reames

Differential Revision: http://reviews.llvm.org/D5634

llvm-svn: 220055
2014-10-17 17:39:00 +00:00
Andrea Di Biagio c48cb86f05 [X86] Fix missed selection of non-temporal store of zero vector.
When the input to a store instruction was a zero vector, the backend
always selected a normal vector store regardless of the non-temporal
hint. This is fixed by this patch.

This fixes PR19370.

llvm-svn: 220054
2014-10-17 17:27:06 +00:00
James Molloy f497d5511d [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.

While we're at it, avoid writing to std::vector::end()...

Bug found with random testing and a lot of coffee.

llvm-svn: 220051
2014-10-17 17:06:31 +00:00
Rafael Espindola 44c661b611 Don't crash if find_executable return None.
This was crashing when trying to run the tests on Windows.

llvm-svn: 220048
2014-10-17 16:07:43 +00:00
Bill Schmidt 2d1128acb2 [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types.  This
patch adds that support.

As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.

In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled.  Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.

A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests.  I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature.  For now, that simply tests the unaligned load/store
behavior.

This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.

llvm-svn: 220047
2014-10-17 15:13:38 +00:00
Jan Vesely b535c902e6 R600: Add EG to FMA test
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220045
2014-10-17 14:45:27 +00:00
Jan Vesely af62cf4db0 SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Vasileios Kalintiris 238692beb9 [mips] Add support for COP1's Branch-On-Cond-Likely instructions
Summary: Depends on D5782

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5802

llvm-svn: 220042
2014-10-17 14:08:28 +00:00
Vasileios Kalintiris 6d1e64896d [mips] Add support for COP0's Branch-On-Cond-Likely instructions
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5782

llvm-svn: 220036
2014-10-17 12:38:35 +00:00
Hal Finkel dd38c0b876 [DSE] Remove no-data-layout-only type-based overlap checking
DSE's overlap checking contained special logic, used only when no DataLayout
was available, which inferred a complete overwrite when the pointee types were
equal. This logic seems fine for regular loads/stores, but does not work for
memcpy and friends. Instead of fixing this, I'm just removing it.
Philosophically, transformations should not contain enhanced behavior used only
when data layout is lacking (data layout should be strictly additive), and
maintaining these rarely-tested code paths seems not worthwhile at this stage.

Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case
(slightly reduced from that provided by Aliaksei) replaces the original
contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few
other tests have been updated to have a data layout.

llvm-svn: 220035
2014-10-17 11:56:00 +00:00
Rafael Espindola b66130209b Add back commits r219835 and a fixed version of r219829.
The only difference from r219829 is using

getOrCreateSectionSymbol(*ELFSec)

instead of

GetOrCreateSymbol(ELFSec->getSectionName())

in ELFObjectWriter which causes us to use the correct section symbol even if
we have multiple sections with the same name.

Original messages:

r219829:
Correctly handle references to section symbols.

When processing assembly like

.long .text

we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.

This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.

The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.

r219835:
Allow forward references to section symbols.

llvm-svn: 220021
2014-10-17 01:48:58 +00:00
Bill Schmidt 32e9c6465b [PPC] Adjust some PowerPC tests to account for presence/absence of VSX
Patch by Bill Seurer; committed on his behalf.

These test cases generate slightly different code sequences when VSX
is activated and thus fail. The update turns off VSX explicitly for
the existing checks and then adds a second set of checks for most of
them that test the VSX instruction output.

llvm-svn: 220019
2014-10-17 01:41:22 +00:00
Rafael Espindola 0eb2cec936 Add a test that would have found the bug in r219829.
llvm-svn: 220016
2014-10-17 01:34:23 +00:00
Akira Hatanaka 0d0c78180d ARM: Fix a bug which was causing convergence failure in constant-island pass.
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:

// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
  BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
  DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}

The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.

This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.

rdar://problem/18581150

llvm-svn: 220015
2014-10-17 01:31:47 +00:00
Rafael Espindola 4544a4062c Revert commit r219835 and r219829.
Revert "Correctly handle references to section symbols."
Revert "Allow forward references to section symbols."

Rui found a regression I am debugging.

llvm-svn: 220010
2014-10-17 01:06:02 +00:00
Peter Zotov 9e40430102 [OCaml] Add Llvm.instr_clone.
llvm-svn: 220008
2014-10-17 01:02:40 +00:00
Alexander Potapenko 7aaf514092 [llvm-symbolizer] Introduce the -dsym-hint option.
llvm-symbolizer will consult one of the .dSYM paths passed via -dsym-hint
if it fails to find the .dSYM bundle at the default location.

llvm-svn: 220004
2014-10-17 00:50:19 +00:00
Peter Collingbourne 659670d933 Add our own copy of the find_executable function to cope with installations
that do not have the distutils.spawn package. Should hopefully fix the
aarch64 buildbot.

llvm-svn: 219991
2014-10-16 23:43:20 +00:00
Peter Collingbourne 82e3e373b3 Initial version of Go bindings.
This code is based on the existing LLVM Go bindings project hosted at:
https://github.com/go-llvm/llvm

Note that all contributors to the gollvm project have agreed to relicense
their changes under the LLVM license and submit them to the LLVM project.

Differential Revision: http://reviews.llvm.org/D5684

llvm-svn: 219976
2014-10-16 22:48:02 +00:00
Robin Morisset e2de06bef6 Erase fence insertion from SelectionDAGBuilder.cpp (NFC)
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
  exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
  does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
  It happens to mostly work for the other targets because they are extremely
  conservative, but Power for example had to switch to AtomicExpand to be
  able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
  in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
  x.store(1);
Thread 1:
  y.store(1);
Thread 2:
  r1 = x.load();
  r2 = y.load();
Thread 3:
  r3 = y.load();
  r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..

This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.

Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.

Test Plan: make check-all, no functional change

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5474

llvm-svn: 219957
2014-10-16 20:34:57 +00:00
Matt Arsenault a3fe7c62d1 R600: Fix nonsensical implementation of computeKnownBits for BFE
This was resulting in invalid simplifications of sdiv

llvm-svn: 219953
2014-10-16 20:07:40 +00:00
Rafael Espindola 11aaaeebe0 Delete -std-compile-opts.
These days -std-compile-opts was just a silly alias for -O3.

llvm-svn: 219951
2014-10-16 20:00:02 +00:00
Bjorn Steinbrink d20816fde9 Allow call-slop optzn for destinations with a suitable dereferenceable attribute
Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5832

llvm-svn: 219950
2014-10-16 19:43:08 +00:00
Sanjay Patel c699a6117b fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944
2014-10-16 18:48:17 +00:00
Juergen Ributzka 03a0611061 [AArch64] Fix miscompile of sdiv-by-power-of-2.
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.

This fixes rdar://problem/18678801.

llvm-svn: 219934
2014-10-16 16:41:15 +00:00
Vasileios Kalintiris 167c372118 [mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes.
Summary:
In order to support big endian targets for the BuildPairF64 nodes we
just need to swap the low/high pair registers. Additionally, for the
ExtractElementF64 nodes we have to calculate the correct stack offset
with respect to the node's register/operand that we want to extract.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5753

llvm-svn: 219931
2014-10-16 15:41:51 +00:00
Vasileios Kalintiris 711028f718 [mips] Marked the DI/EI instruction aliases as MIPS32r2
Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5751

llvm-svn: 219927
2014-10-16 15:23:52 +00:00
Akira Hatanaka 5c221ef98f Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.

llvm-svn: 219902
2014-10-16 06:00:46 +00:00
Saleem Abdulrasool 7f52921976 TRE: make TRE a bit more aggressive
Make tail recursion elimination a bit more aggressive.  This allows us to get
tail recursion on functions that are just branches to a different function.  The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.

llvm-svn: 219899
2014-10-16 03:27:30 +00:00
Akira Hatanaka 40c2cf4afc Revert r219832.
llvm-svn: 219884
2014-10-16 01:17:02 +00:00
Sanjoy Das 360b1ed5f2 Revert "r219834 - Teach ScalarEvolution to sharpen range information"
This change breaks the asan buildbots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468

llvm-svn: 219878
2014-10-15 23:46:04 +00:00
Hal Finkel 68dc3c7ab2 Preserve non-byval pointer alignment attributes using @llvm.assume when inlining
For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.

llvm-svn: 219876
2014-10-15 23:44:41 +00:00
Adam Nemet 4285c1f8cc [AVX512] Add DQ subvector inserts
In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively.  These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).

Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass.  The "Alt" Pat<>'s are disabled with DQ.

Fixes <rdar://problem/18426089>

llvm-svn: 219874
2014-10-15 23:42:17 +00:00
Adam Nemet 2b71ca5d88 [AVX512] Add SKX testing to avx512-insert-extract.ll
This is in preparation to adding DQ subvector inserts to this testcase.

llvm-svn: 219873
2014-10-15 23:42:14 +00:00
Adam Nemet f3aba14dad [AVX512] Fix test to produce a defined value
We're inserting into a 8 wide vector, so the index should be < 8.

llvm-svn: 219872
2014-10-15 23:42:11 +00:00
Tom Stellard c8d7920ad9 R600/SI: Fix bug where immediates were being used in DS addr operands
The SelectDS1Addr1Offset complex pattern always tries to store constant
lds pointers in the offset operand and store a zero value in the addr operand.
Since the addr operand does not accept immediates, the zero value
needs to first be copied to a register.

This newly created zero value will not go through normal instruction
selection, so we need to manually insert a V_MOV_B32_e32 in the complex
pattern.

This bug was hidden by the fact that if there was another zero value
in the DAG that had not been selected yet, then the CSE done by the DAG
would use the unselected node for the addr operand rather than the one
that was just created.  This would lead to the zero value being selected
and the DAG automatically inserting a V_MOV_B32_e32 instruction.

llvm-svn: 219848
2014-10-15 21:08:59 +00:00
Rafael Espindola 78206c3576 Allow forward references to section symbols.
llvm-svn: 219835
2014-10-15 19:30:18 +00:00
Sanjoy Das 90c2f1455a Teach ScalarEvolution to sharpen range information.
If x is known to have the range [a, b) in a loop predicated by (icmp
ne x, a), its range can be sharpened to [a + 1, b).  Get
ScalarEvolution and hence IndVars to exploit this fact.
    
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.

phabricator: http://reviews.llvm.org/D5639
llvm-svn: 219834
2014-10-15 19:25:28 +00:00
Akira Hatanaka 5bb9346a45 InstCombine: Narrow switch instructions using known bits.
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.

rdar://problem/17720004

llvm-svn: 219832
2014-10-15 19:05:50 +00:00
Juergen Ributzka f82c987a5c Reapply "[FastISel][AArch64] Add custom lowering for GEPs."
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.

The original commit had a bug in the add emit logic, which has been fixed.

llvm-svn: 219831
2014-10-15 18:58:07 +00:00
Rafael Espindola a74b5e6823 Correctly handle references to section symbols.
When processing assembly like

.long .text

we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.

This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.

The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.

llvm-svn: 219829
2014-10-15 18:55:30 +00:00
Matt Arsenault 1a74aff846 R600/SI: Also try to use 0 base for misaligned 8-byte DS loads.
llvm-svn: 219823
2014-10-15 18:06:43 +00:00
Matt Arsenault 7b68fdf3c0 R600: Fix miscompiles when BFE has multiple uses
SimplifyDemandedBits would break the other uses of the operand.

llvm-svn: 219819
2014-10-15 17:58:34 +00:00
Hal Finkel 3b7fc86677 [SLPVectorize] Basic ephemeral-value awareness
The SLP vectorizer should not vectorize ephemeral values. These are used to
express information to the optimizer, and vectorizing them does not lead to
faster code (because the ephemeral values are dropped prior to code generation,
vectorized or not), and obscures the information the instructions are
attempting to communicate (the logic that interprets the arguments to
@llvm.assume generically does not understand vectorized conditions).

Also, uses by ephemeral values are free (because they, and the necessary
extractelement instructions, will be dropped prior to code generation).

llvm-svn: 219816
2014-10-15 17:35:01 +00:00
Derek Schuff 05fb735f3a [MC] Make bundle alignment mode setting idempotent and support nested bundles
Summary:
Currently an error is thrown if bundle alignment mode is set more than once
per module (either via the API or the .bundle_align_mode directive). This
change allows setting it multiple times as long as the alignment doesn't
change.

Also nested bundle_lock groups are currently not allowed. This change allows
them, with the effect that the group stays open until all nests are exited,
and if any of the bundle_lock directives has the align_to_end flag, the
group becomes align_to_end.

These changes make the bundle aligment simpler to use in the compiler, and
also better match the corresponding support in GNU as.

Reviewers: jvoung, eliben

Differential Revision: http://reviews.llvm.org/D5801

llvm-svn: 219811
2014-10-15 17:10:04 +00:00
Juergen Ributzka 42379d4cf7 Revert "[FastISel][AArch64] Add custom lowering for GEPs."
This breaks our internal build bots. Reverting it to get the bots green again.

llvm-svn: 219776
2014-10-15 04:55:48 +00:00
Jingyue Wu 2954280f6a [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.

This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.

Thanks Alexey Volkov for reporting PR21115! 

Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.

Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.

Updated an affected test in X86.

Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.

Reviewers: Jiangning, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5633

llvm-svn: 219773
2014-10-15 03:27:43 +00:00
Nick Kledzik 51d2c2bf85 [llvm-objdump] Update error message and add test case for mach-o file with bad library ordinals
llvm-svn: 219746
2014-10-14 23:29:38 +00:00
Gerolf Hoflehner a4c96d02a2 [AAarch64] Optimize CSINC-branch sequence
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.

Examples:

1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
   to b.<invCC>

2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
   to b.<CC>


rdar://problem/18506500

llvm-svn: 219742
2014-10-14 23:07:53 +00:00
Hal Finkel 1a600faba0 [LoopVectorize] Ignore @llvm.assume for cost estimates and legality
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).

Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).

llvm-svn: 219741
2014-10-14 22:59:49 +00:00