Commit Graph

81796 Commits

Author SHA1 Message Date
Michael Zolotukhin 80d13bac02 [Unroll] Add debug dumps to loop-unroll analyzer.
llvm-svn: 243471
2015-07-28 20:07:29 +00:00
Vasileios Kalintiris 9ec6114860 [mips][FastISel] Fix generated code for IR's select instruction.
Summary:
Generate correct code for the select instruction by zero-extending
it's boolean/condition operand to GPR-width. This is necessary because
the conditional-move instructions operate on the whole register.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11506

llvm-svn: 243469
2015-07-28 19:57:25 +00:00
Michael Zolotukhin a425c9d0e3 [Unroll] Don't analyze blocks outside the loop.
llvm-svn: 243466
2015-07-28 19:21:21 +00:00
Matt Arsenault 7227cc1a48 AMDGPU: Don't try to use LDS/vector for private if pointer value stored
If the pointer is the store's value operand, this would produce
a broken module. Make sure the use is actually for the pointer operand.

llvm-svn: 243462
2015-07-28 18:47:00 +00:00
Matt Arsenault fdcd39a8ad AMDGPU: Fix crash if called function is a bitcast
getCalledFunction() is null, so this would crash. Replace
crash with an error on unsupported call.

llvm-svn: 243461
2015-07-28 18:29:14 +00:00
Jingyue Wu 42f1d67a45 [SCEV] Apply NSW and NUW flags via poison value analysis
Summary:
Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.

There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial.

This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this:

  for (int i = 0; i < limit; ++i) {
    sum += ptr[i + offset];
  }

Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.


There are more details in this discussion on llvmdev.
June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234
July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392

Patch by Bjarke Roune

Reviewers: eliben, atrick, sanjoy

Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits

Differential Revision: http://reviews.llvm.org/D11212

llvm-svn: 243460
2015-07-28 18:22:40 +00:00
Matt Arsenault 916cea5682 AMDGPU: Fix return type of getImplicitParameterOffset.
Patch by Zoltan Gilian <zoltan.gilian@gmail.com>

llvm-svn: 243459
2015-07-28 18:09:55 +00:00
Lang Hames 2e88f4fc5f [RuntimeDyld] Make LoadedObjectInfo::getLoadedSectionAddress take a SectionRef
rather than a string section name.

llvm-svn: 243456
2015-07-28 17:52:11 +00:00
Alex Lorenz deb534907e MIR Serialization: Serialize the block address machine operands.
llvm-svn: 243453
2015-07-28 17:28:03 +00:00
JF Bastien ae7eebd429 WebAssembly: MCAsmInfo only has one syntax variant for now.
Summary: MCAsmInfo is set up with the default AssemblerDialect, which is zero.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11567

llvm-svn: 243452
2015-07-28 17:23:07 +00:00
Alex Lorenz 41df7d3d10 MIR Parser: Extract the method 'parseGlobalValue'. NFC.
This commit extracts the code that parses a global value from the method
'parseGlobalAddressOperand' into a new method 'parseGlobalValue', so that this
code can be reused by the method which will parse the block address machine
operands.

llvm-svn: 243450
2015-07-28 17:09:52 +00:00
Alex Lorenz 82a1cfdca2 MIR Parser: Move the function 'lexName'. NFC.
This commit moves the function 'lexName' to the start of the file so it can
be reused by the function which will lex the named LLVM IR block references.

llvm-svn: 243449
2015-07-28 17:03:40 +00:00
Alex Lorenz e8ce3e616b MIR Printer: Remove an outdated TODO comment and assertion. NFC.
This commit removes an outdated TODO comment and a corresponding assertion
which asserts that the mir printer can't the print machine basic blocks that
aren't sequentially numbered.

This comment and assertion were correct when I was working on the patch which
serialized the machine basic blocks, but then I decided to add an 'ID'
attribute to the machine basic block's YAML mapping based on the patch review.
This comment and assertion then became invalid as with the 'ID' attribute we
can serialize the non sequential machine basic blocks and their references
without any problems.

llvm-svn: 243447
2015-07-28 16:56:45 +00:00
Alex Lorenz db07c40943 MIR Parser: Remove redundant parameters. NFC.
This commit removes the redundant parameters from the two methods
'initializeRegisterInfo' and 'initializeFrameInfo'. The removed parameters are
redundant as we are already passing in the 'MachineFunction' to those methods,
and those parameters can be derived from the machine function parameter.

llvm-svn: 243445
2015-07-28 16:48:37 +00:00
Chih-Hung Hsieh 1e859582d6 Implement target independent TLS compatible with glibc's emutls.c.
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.

clang and driver changes in http://reviews.llvm.org/D10524

  Added -femulated-tls flag to select the emulated TLS model,
  which will be used for old targets like Android that do not
  support ELF TLS models.

Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.

Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.

TODO: Add proper DIE for emulated TLS variables.
      Added new unit tests with emulated TLS.

Differential Revision: http://reviews.llvm.org/D10522

llvm-svn: 243438
2015-07-28 16:24:05 +00:00
Martell Malone 1eff5c9c09 Summary:
Object: add IMAGE_FILE_MACHINE_ARM64

The official specifications state that the value of IMAGE_FILE_MACHINE_ARM64
is 0xAA64 (as per the Microsoft Portable Executable and Common Object Format
Specification v8.3).

Reviewers: rnk

Subscribers: llvm-commits, compnerd, ruiu

Differential Revision: http://reviews.llvm.org/D11511

llvm-svn: 243434
2015-07-28 16:18:17 +00:00
Bruno Cardoso Lopes 51fd242cfc [LVI] Cleanup whitespaces. NFC
llvm-svn: 243430
2015-07-28 15:53:21 +00:00
Sanjay Patel d411114e77 fix formatting; NFC
llvm-svn: 243424
2015-07-28 15:38:43 +00:00
Geoff Berry c573bf7a5f [AArch64] Match float round and convert to int instructions.
Summary:
Add patterns for doing floating point round with various rounding modes
followed by conversion to int as a single FCVT* instruction.

Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D11424

llvm-svn: 243422
2015-07-28 15:24:10 +00:00
Silviu Baranga 4825060059 [LAA] Add clarifying comments for the checking pointer grouping algorithm. NFC
llvm-svn: 243416
2015-07-28 13:44:08 +00:00
Adhemerval Zanella 7bc3319d84 Implement __builtin_thread_pointer
This path add the aarch64 lowering of __builtin_thread_pointer.  It uses
the already implemented AArch64ISD::THREAD_POINTER used in TLS generation.

llvm-svn: 243412
2015-07-28 13:03:31 +00:00
Chandler Carruth 99ad7bb49c [GMR] Teach GlobalsModRef to distinguish an important and safe case of
no-alias with non-addr-taken globals: they cannot alias a captured
pointer.

If the non-global underlying object would have been a capture were it to
alias the global, we can firmly conclude no-alias. It isn't reasonable
for a transformation to introduce a capture in a way observable by an
alias analysis. Consider, even if it were to temporarily capture one
globals address into another global and then restore the other global
afterward, there would be no way for the load in the alias query to
observe that capture event correctly. If it observes it then the
temporary capturing would have changed the meaning of the program,
making it an invalid transformation. Even instrumentation passes or
a pass which is synthesizing stores to global variables to expose race
conditions in programs could not trigger this unless it queried the
alias analysis infrastructure mid-transform, in which case it seems
reasonable to return results from before the transform started.

See the comments in the change for a more detailed outlining of the
theory here.

This should address the primary performance regression found when the
non-conservatively-correct path of the alias query was disabled.

Differential Revision: http://reviews.llvm.org/D11410

llvm-svn: 243405
2015-07-28 11:11:11 +00:00
Michael Kuperstein cba308cf96 [X86] Remove mergeSPUpdatesUp()
X86FrameLowering has both a mergeSPUpdates() that accepts a direction, and an
mergeSPUpdatesUp(), which seem to do the same thing, except for a slightly 
different interface. Removed the less general function.
NFC.

Differential Revision: http://reviews.llvm.org/D11510

llvm-svn: 243396
2015-07-28 08:56:13 +00:00
Simon Pilgrim df984f58ad [X86][SSE] Use bitmasks instead of shuffles where possible.
VPAND is a lot faster than VPSHUFB and VPBLENDVB - this patch ensures we attempt to lower to a basic bitmask before lowering to the slower byte shuffle/blend instructions.

Split off from D11518.

Differential Revision: http://reviews.llvm.org/D11541

llvm-svn: 243395
2015-07-28 08:54:41 +00:00
Igor Breger 8352a0ddf2 AVX512: Implemented encoding and intrinsics for VGETEXPSS/D instructions
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11528

llvm-svn: 243390
2015-07-28 06:53:28 +00:00
Puyan Lotfi 567001c281 Changes for MachineBasicBlock to use SortedVector for LiveIns.
llvm-svn: 243389
2015-07-28 06:38:41 +00:00
Mehdi Amini b58f8137c1 Move the Target way of overriding DAG Scheduler to a target hook
Summary:
The previous way of overriding it was relying on calling "setDefault"
on the global registry, which implies global mutable state.

Reviewers: echristo, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11538

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243388
2015-07-28 06:18:04 +00:00
Chandler Carruth 786e6db187 [GMR] Fix a long-standing bug in GlobalsModRef where it failed to clear
out the per-function modref data structures when functions were deleted
or when globals were deleted.

I don't actually know how the global deletion side of this bug hasn't
been hit before, but for the other it just-so-happens that functions
aren't likely to be deleted in the particular part of the LTO pipeline
where we currently enable GMR, so we got lucky.

With this patch, I can self-host with GMR enabled in the normal pass
pipeline!

I was a bit concerned about the compile-time impact of this chang, which
is part of what motivated my prior string of patches to make the
per-function datastructure very dense and fast to walk. With those
changes in place, I can't measure a significant compile time difference
(the difference is around 0.1% which is *way* below the noise) before
and after this patch when building a linked bitcode for all of Clang.

Differential Revision: http://reviews.llvm.org/D11453

llvm-svn: 243385
2015-07-28 06:01:57 +00:00
Adam Nemet 0a674401bf [LDist][LVer] Explicitly pass the set of memchecks to LoopVersioning, NFC
Before the patch, the checks were generated internally in
addRuntimeCheck.  Now, we use the new overloaded version of
addRuntimeCheck that takes the ready-made set of checks as a parameter.

The checks are now generated by the client (LoopDistribution) with the
new RuntimePointerChecking::generateChecks API.

Also the new printChecks API is used to print out the checks for
debugging.

This is to continue the transition over to the new model whereby clients
will get the full set of checks from LAA, filter it and then pass it to
LoopVersioning and in turn to addRuntimeCheck.

llvm-svn: 243382
2015-07-28 05:01:53 +00:00
Craig Topper 7554da2ca3 Remove unnecessary const_casts. NFC
llvm-svn: 243380
2015-07-28 04:28:46 +00:00
Bob Wilson 043ee65ef3 Reserve some constant values for the Swift calling convention.
Swift has a custom calling convention that also requires some new flags
on arguments and one new attribute on alloca instructions. This patch
does not include the implementation of that calling convention - that
will be provided as part of the open-source release of Swift; this only
reserves the bitcode constant values so that they are not used for
other purposes.

llvm-svn: 243379
2015-07-28 04:05:45 +00:00
Kostya Serebryany ae7df1ca4d [libFuzzer] ensure that the dfsan tracing hooks actually run (using -verbosity=3 in tests)
llvm-svn: 243365
2015-07-28 01:25:00 +00:00
Kostya Serebryany 35959592a3 [libFuzzer] when using cmp traces, first check that the CMP is evaluated to one value much more frequently than to the other value (heuristic)
llvm-svn: 243363
2015-07-28 00:59:53 +00:00
Sanjay Patel 8c13e3680d fix invalid load folding with SSE/AVX FP logical instructions (PR22371)
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826
https://llvm.org/bugs/show_bug.cgi?id=22371#c25

The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding 
legally.

Differential Revision: http://reviews.llvm.org/D11477

llvm-svn: 243361
2015-07-28 00:48:32 +00:00
David Blaikie 71c9c9ce31 [opaque pointer type] Avoid using pointee types to retrieve InlineAsm's function type
As a stop-gap, retrieving the InlineAsm's function type was done via the
pointee type of its (pointer) Value type.

Instead, pass down and store the FunctionType in the InlineAsm object.

The only wrinkle with this is the ConstantUniqueMap, which then needs to
ferry the FunctionType down through the InlineAsmKeyType. This could be
done a bit differently if the ConstantInfo trait were broadened a bit to
provide an extension point for access to the TypeClass object from the
ValType objects, so that the ConstantUniqueMap<InlineAsm> would then be
keyed on FunctionTypes instead of PointerTypes that point to
FunctionTypes.

This drops the number of IR tests that don't roundtrip through bitcode*
without calling PointerType::getElementType from 416 to 8 (out of
10733). 3 of those crash when roundtripping at ToT anyway.

* modulo various unavoidable uses of pointer types when validating IR
  (for now) and in the way globals are parsed, unfortunately. These
  cases will either go away (because such validation will no longer be
  necessary or possible when pointee types are opaque), or have to be
  made simultaneously with the removal of pointee types.

llvm-svn: 243356
2015-07-28 00:06:38 +00:00
Adam Nemet 54f0b83ee2 [LAA] Split out a helper to print a collection of memchecks
This is effectively an NFC but we can no longer print the index of the
pointer group so instead I print its address.  This still lets us
cross-check the section that list the checks against the section that
list the groups (see how I modified the test).

E.g. before we printed this:

    Run-time memory checks:
    Check 0:
      Comparing group 0:
        %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
        %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
      Against group 1:
        %arrayidxA = getelementptr i16, i16* %a, i64 %ind
        %arrayidxA1 = getelementptr i16, i16* %a, i64 %add
    ...
    Grouped accesses:
      Group 0:
        (Low: %c High: (78 + %c))
          Member: {%c,+,4}<%for.body>
          Member: {(2 + %c),+,4}<%for.body>

Now we print this (changes are underlined):

    Run-time memory checks:
    Check 0:
      Comparing group (0x7f9c6040c320):
                       ~~~~~~~~~~~~~~
        %arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
        %arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
      Against group (0x7f9c6040c358):
                     ~~~~~~~~~~~~~~
        %arrayidxA1 = getelementptr i16, i16* %a, i64 %add
        %arrayidxA = getelementptr i16, i16* %a, i64 %ind
    ...
    Grouped accesses:
      Group 0x7f9c6040c320:
            ~~~~~~~~~~~~~~
        (Low: %c High: (78 + %c))
          Member: {(2 + %c),+,4}<%for.body>
          Member: {%c,+,4}<%for.body>

llvm-svn: 243354
2015-07-27 23:54:41 +00:00
David Blaikie 41ba2b47da [opaque pointers] Avoid the use of pointee types when parsing inline asm in IR
When parsing calls to inline asm the pointee type (of the pointer type
representing the value type of the InlineAsm value) was used. To avoid
using it, use the ValID structure to ferry the FunctionType directly
through to the InlineAsm construction.

This is a bit of a workaround - alternatively the inline asm could
explicitly describe the type but that'd be verbose/redundant in the IR
and so long as the inline asm calls directly in the context of a call or
invoke, this should suffice.

llvm-svn: 243349
2015-07-27 23:32:19 +00:00
Sanjoy Das 93b3504aa8 [LSR] Generate and use zero extends
Summary:
If a scale or a base register can be rewritten as "Zext({A,+,1})" then
LSR will now consider a formula of that form in its normal cost
computation.

Depends on D9180

Reviewers: qcolombet, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9181

llvm-svn: 243348
2015-07-27 23:27:51 +00:00
Sanjoy Das c3182d8c43 [TargetTransformInfo][NFCI] Add TargetTransformInfo::isZExtFree.
Summary:
This function is not used in this change but will be used in a
subsequent change.

Reviewers: mcrosier, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9180

llvm-svn: 243347
2015-07-27 23:27:43 +00:00
JF Bastien 088c47ee5b WebAssembly: add a generic CPU
Summary: WebAssemblySubtarget.cpp expects a default 'generic' CPU to exist, and this seems to be prevalent with other targets. It makes sense to have something between MVP and bleeding-edge, even though for now it's the same as MVP. This removes a warning that's currently generated.

Subscribers: jfb, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D11546

llvm-svn: 243345
2015-07-27 23:25:54 +00:00
Alex Lorenz 8a1915b04e MIR Serialization: Serialize the unnamed basic block references.
This commit serializes the references from the machine basic blocks to the
unnamed basic blocks.

This commit adds a new attribute to the machine basic block's YAML mapping
called 'ir-block'. This attribute contains the actual reference to the
basic block.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243340
2015-07-27 22:42:41 +00:00
JF Bastien 6c6efa1786 WebAssembly: more MCAsmInfo nits.
Summary: As suggested by sunfish.

Subscribers: jfb, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D11544

llvm-svn: 243339
2015-07-27 22:40:31 +00:00
Colin LeMahieu fe36f83b11 [llvm-mc] Add --no-warn flag with -W alias to disable outputting warnings while assembling.
llvm-svn: 243338
2015-07-27 22:39:14 +00:00
Alex Lorenz 991a6241d3 IR: Expose the method 'getLocalSlot' in the module slot tracker.
This commit publicly exposes the method 'getLocalSlot' in the
'ModuleSlotTracker' class.

This change is useful for MIR serialization, to serialize the unnamed basic
block and unnamed alloca references.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243336
2015-07-27 22:31:04 +00:00
Alexandros Lamprineas 4ea707555a - Added support for parsing HWDiv features using Target Parser.
- Architecture extensions are represented as a bitmap.

Phabricator: http://reviews.llvm.org/D11457
llvm-svn: 243335
2015-07-27 22:26:59 +00:00
Colin LeMahieu fe2c8b8015 [llvm-mc] Pushing plumbing through for --fatal-warnings flag.
llvm-svn: 243334
2015-07-27 21:56:53 +00:00
Sanjoy Das 5dab205ced [IndVars] Make loop varying predicates loop invariant.
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"

This change re-implements D9784 with the two differences:

 1. It does not use SCEVExpander and does not generate new
    instructions.  Instead, it does a quick local search for existing
    `llvm::Value`s that it needs when modifying the `icmp`
    instruction.

 2. It is more general -- it deals with both increasing and decreasing
    induction variables.

I've added all of the tests included with D9784, and two more.

As an example on what this change does (copied from D9784):

Given C code:

```
for (int i = M; i < N; i++) // i is known not to overflow
  if (i < 0) break;
  a[i] = 0;
}
```

This transformation produces:

```
for (int i = M; i < N; i++)
  if (M < 0) break;
  a[i] = 0;
}
```

Which can be unswitched into:

```
if (!(M < 0))
  for (int i = M; i < N; i++)
    a[i] = 0;
}
```

I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine.  Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison.  I'm open to putting it in its own helper routine if you
think that is better.

Reviewers: reames, nicholas, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11278

llvm-svn: 243331
2015-07-27 21:42:49 +00:00
Sanjay Patel 1cf245fd96 remove unnecessary forward declaration; NFC
llvm-svn: 243328
2015-07-27 21:11:55 +00:00
Sanjay Patel aa99a2304d don't repeat function names in comments; NFC
llvm-svn: 243327
2015-07-27 21:03:03 +00:00
JF Bastien 1a12bf1aa2 WebAssembly: minor MCAsmInfo fixes
Summary:
Fix pointer / callee-save stack sto size.
Update comment character to be LISP-ish.

Subscribers: llvm-commits, sunfish, jfb

Differential Revision: http://reviews.llvm.org/D11537

llvm-svn: 243326
2015-07-27 20:46:51 +00:00
Alex Lorenz 5b0d5f6f26 MIR Serialization: Serialize the '.cfi_def_cfa_register' CFI instruction.
llvm-svn: 243322
2015-07-27 20:39:03 +00:00
Alex Lorenz 1ea608986d MIR Parser: Rename the standalone parsing methods. NFC.
This commit renames the methods 'parseMBB' and 'parseNamedRegister' to
'parseStandaloneMBB' and 'parseStandaloneNamedRegister' in order for their
names to be consistent with the method 'parseStandaloneVirtualRegister'.

llvm-svn: 243319
2015-07-27 20:29:27 +00:00
Bruno Cardoso Lopes b20841df44 Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources"
Still breaks some ARM buildbots. This reverts r243271.

llvm-svn: 243318
2015-07-27 20:26:04 +00:00
Adam Nemet 7c52e0527d [LAA] Upper-case variable names, NFC
llvm-svn: 243313
2015-07-27 19:38:50 +00:00
Adam Nemet bbe1f1de16 [LAA] Split out a helper from addRuntimeCheck to generate the check, NFC
llvm-svn: 243312
2015-07-27 19:38:48 +00:00
Akira Hatanaka 2541e0241c [AArch64] Remove check for Darwin that was needed to decide if x18 should
be reserved.

The decision to reserve x18 is going to be made solely by the front-end,
so it isn't necessary to check if the OS is Darwin in the backend.

llvm-svn: 243308
2015-07-27 19:18:47 +00:00
Simon Pilgrim 074c0d97dc Fixed signed/unsigned comparison warning.
llvm-svn: 243306
2015-07-27 19:07:15 +00:00
Simon Pilgrim 15c0a59463 [InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.

Differential Revision: http://reviews.llvm.org/D11503

llvm-svn: 243303
2015-07-27 18:52:15 +00:00
Pete Cooper 11bd958cb6 Revert "Remove unnecessary null check. NFC."
This reverts commit r243167.

Duncan pointed out that dyn_cast can return null in these cases, so this
was an unsafe commit to make.  Sorry for the noise.

Worryingly there were no tests which fail...

llvm-svn: 243302
2015-07-27 18:37:58 +00:00
Matt Arsenault 95365ca482 Fix assert when inlining a constantexpr addrspacecast
The pointer size of the addrspacecasted pointer might not have matched,
so this would have hit an assert in accumulateConstantOffset.

I think this was here to allow constant folding of a load of an
addrspacecasted constant. Accumulating the offset through the
addrspacecast doesn't make much sense, so something else is necessary
to allow folding the load through this cast.

llvm-svn: 243300
2015-07-27 18:31:03 +00:00
Diego Novillo cd973c4f77 Fix ODR violation. NFC.
There is an ODR conflict between lib/ExecutionEngine/ExecutionEngineBindings.cpp
and lib/Target/TargetMachineC.cpp. The inline definitions should simply
be marked static (thanks dblaikie for the hint).

llvm-svn: 243298
2015-07-27 18:27:23 +00:00
Marek Olsak 93df060871 AMDGPU: don't match vgpr loads for constant loads
Author: Dave Airlie <airlied@redhat.com>

In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.

this should be applicable for llvm 3.7 as well.

llvm-svn: 243294
2015-07-27 18:16:08 +00:00
Sanjay Patel c1c2b87001 move combineRepeatedFPDivisors logic into a helper function; NFCI
llvm-svn: 243293
2015-07-27 17:58:49 +00:00
Alex Lorenz 10b23525cc Reset the virtual registers in liveins when clearing the virtual registers.
This commit zeroes out the virtual register references in the machine
function's liveins in the class 'MachineRegisterInfo' when the virtual
register definitions are cleared.

Reviewers: Matthias Braun
llvm-svn: 243290
2015-07-27 17:51:59 +00:00
Alex Lorenz 12045a4b59 MIR Serialization: Serialize the machine function's liveins.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243288
2015-07-27 17:42:45 +00:00
Sanjay Patel beb4cffb43 fix typo and spacing; NFC
llvm-svn: 243287
2015-07-27 17:39:20 +00:00
Davide Italiano fa04402e24 [TableGen] Emit the correct error message.
llvm-svn: 243284
2015-07-27 17:22:19 +00:00
Pete Cooper 0ae7393027 Revert "Add const to a bunch of Type* in DataLayout. NFC."
This reverts commit r243135.

Feedback from Craig Topper and David Blaikie was that we don't put const on Type as it has no mutable state.

llvm-svn: 243283
2015-07-27 17:15:28 +00:00
Pete Cooper 2e20147403 Revert "Add const to some Type* parameters which didn't need to be mutable. NFC."
This reverts commit r243146.

Feedback from Craig Topper and David Blaikie was that we don't put const on Type as it has no mutable state.

llvm-svn: 243282
2015-07-27 17:15:24 +00:00
Bruno Cardoso Lopes 669c921bfd [PeepholeOptimizer] Look through PHIs to find additional register sources
Reapply r242295 with fixes in the implementation.

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

llvm-svn: 243271
2015-07-27 14:39:46 +00:00
Silviu Baranga 7581d22512 [ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.

This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.

No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11524

llvm-svn: 243270
2015-07-27 14:39:34 +00:00
Simon Pilgrim 81accb7b27 [X86] Reordered lowerVectorShuffleAsBitMask before lowerVectorShuffleAsBlend. NFCI.
Allows us to show diffs for D11518 more clearly

llvm-svn: 243264
2015-07-27 12:37:19 +00:00
Marek Olsak 1354b87695 AMDGPU/SI: Fix the V_FRACT_F64 SI bug workaround
This is a candidate for 3.7.

llvm-svn: 243263
2015-07-27 11:37:42 +00:00
NAKAMURA Takumi 94abbbd6ab LoopAccessAnalysis.cpp: Tweak r243239 to avoid side effects. It caused different emissions between gcc and clang.
llvm-svn: 243258
2015-07-27 01:35:30 +00:00
Sean Silva e1c6b549ef Avoid using uncommon acronym "MSROM".
llvm-svn: 243256
2015-07-27 00:46:59 +00:00
Jingyue Wu bfefff555e Roll forward r243250
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.

llvm-svn: 243253
2015-07-26 19:10:03 +00:00
Jingyue Wu 84879b71a9 Revert r243250
breaks tests

llvm-svn: 243251
2015-07-26 18:30:13 +00:00
Jingyue Wu bf485f059c [TTI/CostModel] improve TTI::getGEPCost and use it in CostModel::getInstructionCost
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.

I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.

Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.

This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.

Reviewers: chandlerc, hfinkel

Subscribers: aschwaighofer, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D9819

llvm-svn: 243250
2015-07-26 17:28:13 +00:00
Igor Breger f2460112ad Implemented encoding and intrinsics of the following instructions
vunpckhps/pd, vunpcklps/pd, 
  vpunpcklbw, vpunpckhbw, vpunpcklwd, vpunpckhwd, vpunpckldq, vpunpckhdq, vpunpcklqdq, vpunpckhqdq
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11509

llvm-svn: 243246
2015-07-26 14:41:44 +00:00
Adam Nemet 1da7df3700 [LAA] Begin moving the logic of generating checks out of addRuntimeCheck
Summary:
The goal is to start moving us closer to the model where
RuntimePointerChecking will compute and store the checks.  Then a client
can filter the check according to its requirements and then use the
filtered list of checks with addRuntimeCheck.

Before the patch, this is all done in addRuntimeCheck.  So the patch
starts to split up addRuntimeCheck while providing the old API under
what's more or less a wrapper now.

The new underlying addRuntimeCheck takes a collection of checks now,
expands the code for the bounds then generates the code for the checks.

I am not completely happy with making expandBounds static because now it
needs so many explicit arguments but I don't want to make the type
PointerBounds part of LAI.  This should get fixed when addRuntimeCheck
is moved to LoopVersioning where it really belongs, IMO.

Audited the assembly diff of the testsuite (including externals).  There
is a tiny bit of assembly churn that is due to the different order the
code for the bounds is expanded now
(MultiSource/Benchmarks/Prolangs-C/bison/conflicts.s and with LoopDist
on 456.hmmer/fast_algorithms.s).

Reviewers: hfinkel

Subscribers: klimek, llvm-commits

Differential Revision: http://reviews.llvm.org/D11205

llvm-svn: 243239
2015-07-26 05:32:14 +00:00
Simon Pilgrim 54fcd62c6f [InstCombine][SSE4A] Standardized references to Length/Width and Index/Start to match AMD docs. NFCI.
llvm-svn: 243226
2015-07-25 20:41:00 +00:00
Chen Li 145c2f57ae [LoopUnswitch] Improve loop unswitch pass to find trivial unswitch conditions more effectively
Summary:
This patch improves trivial loop unswitch. 

The current trivial loop unswitch only checks if loop header's terminator contains a trivial unswitch condition. But if the loop header only has one reachable successor (due to intentionally or unintentionally missed code simplification), we should consider the successor as part of the loop header. Therefore, instead of stopping at loop header's terminator, we should keep traversing its successors within loop until reach a *real* conditional branch or switch (whose condition can not be constant folded). This change will enable a single -loop-unswitch pass to unswitch multiple trivial conditions (unswitch one trivial condition could open opportunity to unswitch another one in the same loop), while the old implementation can unswitch only one per pass. 

Reviewers: reames, broune

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11481

llvm-svn: 243203
2015-07-25 03:21:06 +00:00
Juergen Ributzka 6364985b58 [AArch64][FastISel] Always use an AND instruction when truncating to non-legal types.
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.

This commit fixes this and adds the missing i32 truncate tests.

This fixes rdar://problem/21990703.

llvm-svn: 243198
2015-07-25 02:16:53 +00:00
Eric Christopher f0024d14f1 Fix PPCMaterializeInt to check the size of the integer based on the
extension property we're requesting - zero or sign extended.

This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.

llvm-svn: 243192
2015-07-25 00:48:08 +00:00
Eric Christopher 03df7ac8a9 PPCMaterializeInt should only take a ConstantInt so represent this in the prototype
and fix up all uses.

llvm-svn: 243191
2015-07-25 00:48:06 +00:00
Akira Hatanaka 0d4c9ea6e0 [AArch64] Define subtarget feature "reserve-x18", which is used to decide
whether register x18 should be reserved.

This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.

Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D11463

llvm-svn: 243186
2015-07-25 00:18:31 +00:00
Duncan P. N. Exon Smith 56b893b364 DI/Verifier: Fix argument bitrot in DILocalVariable
Add a verifier check that `DILocalVariable`s of tag
`DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of
tag `DW_TAG_auto_variable` always have a zero 'arg:' field.  These are
the only configurations that are properly understood by the backend.

(Also, fix the bad examples in LangRef and test/Assembler, and fix the
bug in Kaleidoscope Ch8.)

A large number of testcases seem to have bitrotted their way forward
from some ancient version of the debug info hierarchy that didn't have
`arg:` parameters.  If you have out-of-tree testcases that start failing
in the verifier and you don't care enough to get the `arg:` right, you
may have some luck just calling:

    sed -e 's/, arg: 0/, arg: 1/'

or some such, but I hand-updated the ones in tree.

llvm-svn: 243183
2015-07-24 23:59:25 +00:00
Alex Lorenz 1bb48de1f9 MIR Serialization: Serialize MachineFrameInfo's callee saved information.
This commit serializes the callee saved information from the class
'MachineFrameInfo'. This commit extends the YAML mappings for the fixed and
the ordinary stack objects and adds an optional 'callee-saved-register'
attribute. This attribute is used to serialize the callee save information.

llvm-svn: 243173
2015-07-24 22:22:50 +00:00
Lawrence Hu dc8a83b53b Handle loop with negtive induction variable increment
This patch extend LoopReroll pass to hand the loops which
is similar to the following:

      while (len > 1) {
            sum4 += buf[len];
            sum4 += buf[len-1];
            len -= 2;
        }

llvm-svn: 243171
2015-07-24 22:01:49 +00:00
Pete Cooper 3191697138 Remove unnecessary null check. NFC.
Since both places which set this variable do so with dyn_cast, and not
dyn_cast_or_null, its impossible to get a nullptr here, so we can remove
the check.

llvm-svn: 243167
2015-07-24 21:38:01 +00:00
Pete Cooper 7679afda82 Use make_range(rbegin(), rend()) to allow foreach loops. NFC.
Instead of the pattern

for (auto I = x.rbegin(), E = x.end(); I != E; ++I)

we can use make_range to construct the reverse range and iterate using
that instead.

llvm-svn: 243163
2015-07-24 21:13:43 +00:00
Duncan P. N. Exon Smith b9e045af44 DI: Remove unnecessary DICompositeTypeBase
Remove unnecessary and confusing common base class for `DICompositeType`
and `DISubroutineType`.

While at a high-level `DISubroutineType` is a sort of composite of other
types, it has no shared code paths, and its fields are completely
disjoint.  This relationship was left over from the old debug info
hierarchy.

llvm-svn: 243160
2015-07-24 20:56:36 +00:00
Duncan P. N. Exon Smith 260fa8a75b DI: Simplify DebugInfoFinder::processType(), NFC
Handle `DISubroutineType` up-front rather than as part of a branch for
`DICompositeTypeBase`.  The only shared code path was looking through
the base type, but `DISubroutineType` can never have a base type.

This also removes the last use of `DICompositeTypeBase`, since we can
strengthen the cast to `DICompositeType`.

llvm-svn: 243159
2015-07-24 20:56:10 +00:00
Duncan P. N. Exon Smith 3c5a56b13c DI: Remove dead code: getDICompositeType()
llvm-svn: 243158
2015-07-24 20:46:46 +00:00
Duncan P. N. Exon Smith acd8cf8582 AsmPrinter: Use DICompositeType in updateAcceleratorTables(), NFC
`DISubroutineType` is impossible at this `dyn_cast` site, since we're
only dealing with named types and `DISubroutineType` cannot be named.
Strengthen the `dyn_cast` to `DICompositeType`.

llvm-svn: 243157
2015-07-24 20:45:26 +00:00
Alex Lorenz ab4cbcfda7 MIR Serialization: Serialize the simple virtual register allocation hints.
This commit serializes the virtual register allocations hints of type 0.
These hints specify the preferred physical registers for allocations.

llvm-svn: 243156
2015-07-24 20:35:40 +00:00
Duncan P. N. Exon Smith 338aef0a07 DI: Remove DIDerivedTypeBase
Remove an unnecessary (and confusing) common subclass for
`DIDerivedType` and `DICompositeType`.  These classes aren't really
related, and even in the old debug info hierarchy, there was a
long-standing FIXME to separate them.

llvm-svn: 243152
2015-07-24 20:16:36 +00:00
Duncan P. N. Exon Smith dbfc010691 Verifier: Sink filename check into visitMDCompositeType(), NFC
We really only want to check this for unions and classes (all the other
tags have been ruled out), so simplify the check and move it to the
right place.

llvm-svn: 243150
2015-07-24 19:57:19 +00:00
Duncan P. N. Exon Smith df9c9ff43b Verifier: Remove unnecessary references to DW_TAG_subroutine_type, NFC
Remove unnecessary references to `DW_TAG_subroutine_type` in
`visitDICompositeType()` and `visitDIDerivedTypeBase()`, since
`visitDISubroutineType()` doesn't call either of those (and shouldn't,
since subroutine types are really quite special).

llvm-svn: 243149
2015-07-24 19:52:18 +00:00
Duncan P. N. Exon Smith 89c5e6ff49 DI: Clarify isUnsignedDIType(), NFC
Refactor `isUnsignedDIType()` to deal with `DICompositeType` explicitly.
Since `DW_TAG_subroutine_type` isn't handled here (the assertions about
tags rule it out), this allows strengthening the `dyn_cast` to
`DIDerivedType`.

Besides making the code clearer, this it removes a use of
`DIDerivedTypeBase`.

llvm-svn: 243148
2015-07-24 19:42:12 +00:00
Pete Cooper 098f7c1fcb Add const to some Type* parameters which didn't need to be mutable. NFC.
We were only getting the size of the type which doesn't need to modify
the type.

llvm-svn: 243146
2015-07-24 19:19:26 +00:00
Diego Novillo b9bf447d90 Remove unused variable. NFC.
llvm-svn: 243145
2015-07-24 19:18:32 +00:00
Duncan P. N. Exon Smith 2ecfb1e269 DI: Strengthen some dyn_casts to DIDerivedType, NFC
The surrounding code proves in both cases that these must be
`DIDerivedType` if they're `DIDerivedTypeBase`, so strengthen the
`dyn_cast`s to the more specific type.

llvm-svn: 243143
2015-07-24 19:17:20 +00:00
Jingyue Wu abb05aa3c6 Remove the user-count threshold when analyzing read attributes
Summary:
This threshold limited FunctionAttrs ability to prove arguments to be read-only. 
In NVPTX, a specialized instruction ld.global.nc can be used to load memory
with non-coherent texture cache. We notice that in SHOC [1] benchmark, some
function arguments are not marked with readonly because FunctionAttrs reaches
a hardcoded threshold when analysis uses.

Removing this threshold won't cause significant regression in compilation time, because the worst-case time complexity of the algorithm is still O(# of instructions) for each parameter.

Patched by Xuetian Weng.  

[1] https://github.com/vetter/shoc

Reviewers: nlewycky, jingyue, nicholas

Subscribers: nicholas, test, llvm-commits

Differential Revision: http://reviews.llvm.org/D11311

llvm-svn: 243141
2015-07-24 19:05:53 +00:00
Philip Reames fa2c630f79 [RewriteStatepointsForGC] Adjust naming scheme to be more stable
The names for instructions inserted were previous dependent on iteration order.  By deriving the names from the original instructions, we can avoid instability in tests without resorting to ordered traversals.  It also makes the IR mildly easier to read at large scale.

llvm-svn: 243140
2015-07-24 19:01:39 +00:00
Duncan P. N. Exon Smith 099ea1c9ae DI: Strengthen block-byref cast to DIDerivedType, NFC
This code is visiting the members of a block-byref, and we know those
are all `DIDerivedType`.  Strengthen the cast.

llvm-svn: 243138
2015-07-24 18:58:32 +00:00
Pete Cooper 0debbdc872 Use foreach loops for StructType::elements(). NFC.
We had a few places where we did

for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {

but those could instead do

for (auto *EltTy : STy->elements()) {

llvm-svn: 243136
2015-07-24 18:55:49 +00:00
Pete Cooper 6aaad8d88d Add const to a bunch of Type* in DataLayout. NFC.
Almost all methods in DataLayout took mutable pointers but didn't need to.
These were only accessing constant methods of the types, or using the Type*
to key a map.  Neither of these needs a mutable pointer.

llvm-svn: 243135
2015-07-24 18:29:09 +00:00
Duncan P. N. Exon Smith 6ac940db19 DI: Only DICompositeType has getElements(), NFC
There is an assertion inside `DICompositeTypeBase::getElements()` that
`this` is not a `DISubroutineType`, leaving only `DICompositeType`.
Make that clear at the call sites.

llvm-svn: 243134
2015-07-24 18:17:17 +00:00
Alex Lorenz c7bf20403b MIR Parser: Run the machine verifier after initializing machine functions.
llvm-svn: 243128
2015-07-24 17:44:49 +00:00
Lang Hames a8183e5c40 [RuntimeDyld] MachO: Add support for ARM scattered vanilla relocations.
llvm-svn: 243126
2015-07-24 17:40:04 +00:00
Igor Breger 074a64e72c AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic

Differential Revision: http://reviews.llvm.org/D11218

llvm-svn: 243122
2015-07-24 17:24:15 +00:00
Mehdi Amini 26d481311a Remove access to the DataLayout in the TargetMachine
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.

This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11103

(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243114
2015-07-24 16:04:22 +00:00
Sanjay Patel 0495dbf1e1 fix wrong comment; NFC
llvm-svn: 243113
2015-07-24 16:02:14 +00:00
Luke Cheeseman 4d45ff2b87 [ARM] - Fix lowering of shufflevectors in AArch32
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.

This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.

This patch fixes this by accommodating for both cases and adds regression tests.

Differential Revision: http://reviews.llvm.org/D11407

llvm-svn: 243103
2015-07-24 09:57:05 +00:00
Luke Cheeseman b5c627aba8 When lowering vector shifts a check is performed to see if the value to shift by
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.

Differential Revision: http://reviews.llvm.org/D11408

llvm-svn: 243100
2015-07-24 09:31:48 +00:00
Mehdi Amini 5d8e569926 Revert "Remove access to the DataLayout in the TargetMachine"
This reverts commit 0f720d984f419c747709462f7476dff962c0bc41.

It breaks clang too badly, I need to prepare a proper patch for clang
first.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243089
2015-07-24 03:36:55 +00:00
Alexei Starovoitov 01886a05b8 [bpf] initial support for debug_info
llvm-svn: 243087
2015-07-24 03:17:08 +00:00
Michael Zolotukhin 57776b8159 Handle resolvable branches in complete loop unroll heuristic.
Summary:
Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate.
This patch is intended to be applied after D10205 (though it could be applied independently).

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10206

llvm-svn: 243084
2015-07-24 01:53:04 +00:00
Mehdi Amini b4bc424c9a Remove access to the DataLayout in the TargetMachine
Summary:
Replace getDataLayout() with a createDataLayout() method to make
explicit that it is intended to create a DataLayout only and not
accessing it for other purpose.

This change is the last of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned
by the module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11103

(cherry picked from commit 5609fc56bca971e5a7efeaa6ca4676638eaec5ea)

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 243083
2015-07-24 01:44:39 +00:00
NAKAMURA Takumi a6ccd6cd15 MIRParser/LLVMBuild.txt: Add MC for MCRegisterInfo::getDwarfRegNum().
llvm-svn: 243081
2015-07-24 01:12:36 +00:00
NAKAMURA Takumi d12ebaf9a4 Reorder alphabetically.
llvm-svn: 243080
2015-07-24 01:12:28 +00:00
Kostya Serebryany 404c69f2c8 [libFuzzer] allow users to supply their own implementation of rand
llvm-svn: 243078
2015-07-24 01:06:40 +00:00
Philip Reames 29e9ae7891 [RewriteStatepointsForGC] Fix release build warning
llvm-svn: 243076
2015-07-24 00:42:55 +00:00
Philip Reames 88958b2df3 [RewriteStatepointsForGC] Use a worklist algorithm for first part of base pointer algorithm [NFC]
The new code should hopefully be equivalent to the old code; it just uses a worklist to track instructions which need to visited rather than iterating over all instructions visited each time. This should be faster, but the primary benefit is that the purpose should be more clear and the diff of adding another instruction type (forthcoming) much more obvious.

Differential Revision: http://reviews.llvm.org/D11480

llvm-svn: 243071
2015-07-24 00:02:11 +00:00
Lawrence Hu 687097a0a9 test commit, only added one space
llvm-svn: 243070
2015-07-23 23:55:28 +00:00
Jingyue Wu 2e424da39b [NaryReassociate] remove redundant code
This check is already done by findClosestMatchingDominator.

llvm-svn: 243065
2015-07-23 23:13:37 +00:00
Alex Lorenz 8cfc68677c MIR Serialization: Serialize the '.cfi_offset' CFI instruction.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243062
2015-07-23 23:09:07 +00:00
Sanjay Patel f2fa58e744 fix crash in machine trace metrics due to processing dbg_value instructions (PR24199)
The test in PR24199 ( https://llvm.org/bugs/show_bug.cgi?id=24199 ) crashes because machine
trace metrics was not ignoring dbg_value instructions when calculating data dependencies.

The machine-combiner pass asks machine trace metrics to calculate an instruction trace, 
does some reassociations, and calls MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
along with MachineTraceMetrics::invalidate(). The dbg_value instructions have their operands
invalidated, but the instructions are not expected to be deleted.

On a subsequent loop iteration of the machine-combiner pass, machine trace metrics would be
called again and die while accessing the invalid debug instructions.

Differential Revision: http://reviews.llvm.org/D11423

llvm-svn: 243057
2015-07-23 22:56:53 +00:00
Philip Reames 9b141ed48e [RewriteStatepointsForGC] Rename PhiState to reflect that it's associated w/more than just PHIs
Today, Select instructions also have associated PhiStates.  In the near future, so will ExtractElement and SuffleVector.

llvm-svn: 243056
2015-07-23 22:49:14 +00:00
Philip Reames 2a892a630b [RewriteStatepointsForGC] Use idomatic mechanisms for debug tracing [NFC]
Deleting much of the code using trace-rewrite-statepoints and use idiomatic DEBUG statements instead.  This includes adding operator<< to a helper class.

llvm-svn: 243054
2015-07-23 22:25:26 +00:00
David Gross d9c1bc9955 [ARM] Register (existing) ARMLoadStoreOpt pass with LLVM pass manager.
Summary: Among other things, this allows -print-after-all/-print-before-all to dump IR around this pass.

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11373

llvm-svn: 243052
2015-07-23 22:12:46 +00:00
David Gross 2ad5d173ce Test commit.
llvm-svn: 243046
2015-07-23 21:46:09 +00:00
Philip Reames 273e6bbd11 [RewriteStatepointsForGC] Simplify code around meet of PhiStates [NFC]
We don't need to pass in the map from BDV to PhiStates; we can instead handle that externally and let the MeetPhiStates helper class just meet PhiStates.

llvm-svn: 243045
2015-07-23 21:41:27 +00:00
Matt Wala 878c144f8a [Scalarizer] Fix potential for stale data in Scattered across invocations
Summary:
Scalarizer has two data structures that hold information about changes
to the function, Gathered and Scattered. These are cleared in finish()
at the end of runOnFunction() if finish() detects any changes to the
function.

However, finish() was checking for changes by only checking if
Gathered was non-empty. The function visitStore() only modifies
Scattered without touching Gathered. As a result, Scattered could have
ended up having stale data if Scalarizer only scalarized store
instructions. Since the data in Scattered is used during the execution
of the pass, this introduced dangling pointer errors.

The fix is to check whether both Scattered and Gathered are empty
before deciding what to do in finish(). This also fixes a problem
where the Function can be modified although the pass returns false.

Reviewers: rnk

Subscribers: rnk, srhines, llvm-commits

Differential Revision: http://reviews.llvm.org/D10459

llvm-svn: 243040
2015-07-23 20:53:46 +00:00
Duncan P. N. Exon Smith d531322149 X86: Use dyn_cast instead of isa+cast, NFC
llvm-svn: 243034
2015-07-23 19:27:07 +00:00
Weiming Zhao b33a5557f4 This patch eanble register coalescing to coalesce the following:
%vreg2<def> = MOVi32imm 1; GPR32:%vreg2
  %W1<def> = COPY %vreg2; GPR32:%vreg2
into:
  %W1<def> = MOVi32imm 1
Patched by Lawrence Hu (lawrence@codeaurora.org)

llvm-svn: 243033
2015-07-23 19:24:53 +00:00
Kostya Serebryany 2b7d2e91cc [libFuzzer] dump long running units to disk
llvm-svn: 243031
2015-07-23 18:37:22 +00:00
Michael Kuperstein 454d145395 [X86] Allow load folding into PUSH instructions
Adds pushes to the folding tables.
This also required a fix to the TD definition, since the memory forms of 
the push instructions did not have the right mayLoad/mayStore flags.

Differential Revision: http://reviews.llvm.org/D11340

llvm-svn: 243010
2015-07-23 12:23:45 +00:00
Kuba Brecka 45dbffdc3d [asan] Rename the ABI versioning symbol to '__asan_version_mismatch_check' instead of abusing '__asan_init'
We currently version `__asan_init` and when the ABI version doesn't match, the linker gives a `undefined reference to '__asan_init_v5'` message. From this, it might not be obvious that it's actually a version mismatch error. This patch makes the error message much clearer by changing the name of the undefined symbol to be `__asan_version_mismatch_check_xxx` (followed by the version string). We obviously don't want the initializer to be named like that, so it's a separate symbol that is used only for the purpose of version checking.

Reviewed at http://reviews.llvm.org/D11004

llvm-svn: 243003
2015-07-23 10:54:06 +00:00
Michael Kuperstein ffcc7663a2 [X86] Fix order of operands for ins and outs instructions when parsing intel syntax
Patch by: marina.yatsina@intel.com
Differential Revision: http://reviews.llvm.org/D11337

llvm-svn: 243001
2015-07-23 10:23:48 +00:00
Chandler Carruth 08eebe2074 [GMR] Add a late run of GlobalsModRef to the main pass pipeline behind
the general GMR-in-non-LTO flag.

Without this, we have the global information during the CGSCC pipeline
for GVN and such, but don't have it available during the late loop
optimizations such as the vectorizer. Moreover, after the CGSCC pipeline
has finished we have substantially more accurate and refined call graph
information, function annotations, etc, which will make GMR even more
powerful than it is early in the pipelien.

Note that we have to play silly games with preserving AliasAnalysis
(which is now trivially preserved) in order to let a module analysis
magically be preserved into the entire function pass pipeline.
Simultaneously we have to not make GMR an immutable pass in order to be
able to re-run it and collect fresh data on the final call graph.

llvm-svn: 242999
2015-07-23 09:34:01 +00:00
Elena Demikhovsky 482b303254 X86: Fixed assertion failure in 32-bit mode
The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal.
Added a check for the scalar type before creating this node.
Added a test that fails with assertion on the current version.

Differential Revision: http://reviews.llvm.org/D11413

llvm-svn: 242994
2015-07-23 08:25:23 +00:00
Chandler Carruth fe414353db Revert r242990: "AVX-512: Implemented encoding , DAG lowering and ..."
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.

It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.

llvm-svn: 242992
2015-07-23 08:03:44 +00:00
Chandler Carruth 8e4357d137 [GMR] Switch the function info we store for every function to be a much
more dense datastructure. We actually only have 3 bits of information
and an often-null pointer here. This fits very nicely into a
pointer-size value in the DenseMap from Function -> Info. Then we take
one more pointer hop to get to a secondary DenseMap from GlobalValue ->
ModRefInfo when we actually have precise info for particular globals.

This is more code than I would really like to do this packing, but it
ended up reasonably cleanly laid out. It should ensure we don't hit
scaling limitations with more widespread use of GMR.

llvm-svn: 242991
2015-07-23 07:50:52 +00:00
Igor Breger da1b2ea955 AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsic

Differential Revision: http://reviews.llvm.org/D11218

llvm-svn: 242990
2015-07-23 07:39:21 +00:00
Craig Topper ac7947ec32 [ScalarEvolution] Change addRequired to addRequiredTransitive on two passes where ScalarEvolution stores long lived raw pointers to objects those passes own.
This prevents the pointers from dangling when those passes are freed.

http://reviews.llvm.org/D11236
Patch by Steve King.

llvm-svn: 242989
2015-07-23 07:33:48 +00:00
Igor Breger 87e6397fb1 AVX : Fix ISA disabling in case AVX512VL , some instructions should be disabled only if AVX512BW and AVX512VL present.
Tests added.

Differential Revision: http://reviews.llvm.org/D11414

llvm-svn: 242987
2015-07-23 07:11:14 +00:00
Yaron Keren 4135e4c475 Remove unnecessary in C++11 c_str() calls
While theoratically required in pre-C++11 to avoid re-allocation upon call,
C++11 guarantees that c_str() returns a pointer to the internal array so
pre-calling c_str() is no longer required.

llvm-svn: 242983
2015-07-23 05:49:29 +00:00
Jingyue Wu 6a3fdeca22 [NVPTX] run LSR before straight-line optimizations
Summary:
Straight-line optimizations can simplify the loop body and make LSR's
cost analysis more precise. This significantly improves several Eigen3
CUDA benchmarks.

With this change, EigenContractionKernel runs up to 40% faster
(753ceee5f2/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-502).
EigenConvolutionKernel2D runs up to 10% faster
(753ceee5f2/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h?at=default#cl-605).

I have some difficulties writing small tests that benefit from this
reordering due to a seemingly issue with LSR (being discussed at
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html).

See the review thread for the compilation time impact of GVN. 

Reviewers: eliben, jholewinski

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11304

llvm-svn: 242982
2015-07-23 04:59:07 +00:00
Chandler Carruth dbb4622ce3 [GMR] Further improve the FunctionInfo API inside of GlobalsModRef, NFC.
This takes the operation of merging a callee's information into the
current information and embeds it into the FunctionInfo type itself.
This is much cleaner as now we don't need to expose iteration of the
globals, etc.

Also, switched all the uses of a raw integer two maintain the mod/ref
info during the SCC walk into just directly manipulating it in the
FunctionInfo object.

llvm-svn: 242976
2015-07-23 00:12:32 +00:00
Chandler Carruth 5657b73ebe [GMR] Wrap all of the per-function information behind a more strongly
typed interface as a precursor to rewriting how it is stored.

This way we know that the access paths are controlled and it should be
easy to store these bits in a different way.

No functionality changed.

llvm-svn: 242974
2015-07-22 23:56:31 +00:00
Chandler Carruth 194f59ca5d [PM/AA] Extract the ModRef enums from the AliasAnalysis class in
preparation for de-coupling the AA implementations.

In order to do this, they had to become fake-scoped using the
traditional LLVM pattern of a leading initialism. These can't be actual
scoped enumerations because they're bitfields and thus inherently we use
them as integers.

I've also renamed the behavior enums that are specific to reasoning
about the mod/ref behavior of functions when called. This makes it more
clear that they have a very narrow domain of applicability.

I think there is a significantly cleaner API for all of this, but
I don't want to try to do really substantive changes for now, I just
want to refactor the things away from analysis groups so I'm preserving
the exact original design and just cleaning up the names, style, and
lifting out of the class.

Differential Revision: http://reviews.llvm.org/D10564

llvm-svn: 242963
2015-07-22 23:15:57 +00:00
Chandler Carruth 61ddab6d4c [GMR] Continue my quest to remove linked datastructures from GMR, NFC.
This replaces the next-to-last std::map with a DenseMap. While DenseMap
doesn't yet make tons of sense (there are 32 bytes or so in the value
type), my next change will reduce the value type to a single pointer --
we only need a pointer and 3 bits, and that is exactly what we can have.

llvm-svn: 242956
2015-07-22 22:32:34 +00:00
David Majnemer ed9abe119b [ConstantFolding] Support folding loads from a GlobalAlias
The MSVC ABI requires that we generate an alias for the vtable which
means looking through a GlobalAlias which cannot be overridden improves
our ability to devirtualize.

Found while investigating PR20801.

Patch by Andrew Zhogin!

Differential Revision: http://reviews.llvm.org/D11306

llvm-svn: 242955
2015-07-22 22:29:30 +00:00
Anthony Pesch e92ae2dcd1 Revert "Improve merging of stores from static constructors in GlobalOpt"
This reverts commit 0a9dee959a30b81b9e7df64c9a58ff9898c24024.

llvm-svn: 242954
2015-07-22 22:26:54 +00:00
Anthony Pesch b8531f4f65 Revert "IPO: Avoid brace initialization of a map, some versions of libc++ don't like it"
This reverts commit fc2dad0c68f8d32273d3c2d790ed496961f829af.

llvm-svn: 242953
2015-07-22 22:26:52 +00:00
Chandler Carruth 4cef26eeca [GMR] Make the collection of readers and writers of globals much more
efficient, NFC.

Previously, we built up vectors of function pointers to track readers
and writers. The primary problem here is that we would add the same
function to this vector every time we found an instruction that reads or
writes to the pointer. This could be a *lot* of redudant function
pointers. Instead of doing that, we can use a SmallPtrSet.

This does more than just reduce the size of the list of readers or
writers. We walk the entire lists of each and do a map lookup for each
one. By having sets, we will only do one map lookup per reader or writer
function.

But only one user of the pointer analyzer actually needs this
information, so we can also skip accumulating it (and doing a lot of
heap allocations) for all the other pointer analysis. This is
particularly useful because there are very many more pointers in some of
the other cases.

llvm-svn: 242950
2015-07-22 22:10:05 +00:00
Sanjay Patel efad9eb914 fix typo; NFC
llvm-svn: 242947
2015-07-22 21:56:41 +00:00
Sanjay Patel 4a0ca3fa2e fix indent; NFC
llvm-svn: 242946
2015-07-22 21:47:13 +00:00
Justin Bogner 49c5ce67eb IPO: Avoid brace initialization of a map, some versions of libc++ don't like it
Should fix the build failure on these darwin bots:

http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/12427/
http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/10389/

llvm-svn: 242945
2015-07-22 21:41:12 +00:00
Bruno Cardoso Lopes f16ec12654 [PeepholeOptimizer] Refactor optimizeUncoalescable logic
Reapply r242294.

- Create a new CopyRewriter for Uncoalescable copy-like instructions
- Change the ValueTracker to return a ValueTrackerResult

This makes optimizeUncoalescable looks more like optimizeCoalescable and
use the CopyRewritter infrastructure.

This is also the preparation for looking up into PHI nodes in the
ValueTracker.

rdar://problem/20404526

Differential Revision: http://reviews.llvm.org/D11195

llvm-svn: 242940
2015-07-22 21:30:16 +00:00
JF Bastien b9073fb20a WebAssembly: basic bitcode → assembly CodeGen test
Summary:
Add a basic CodeGen bitcode test which (for now) only prints out the function name and nothing else. The current code merely implements the basic needed for the test run to not crash / assert. Getting to that point required:

 - Basic InstPrinter.
 - Basic AsmPrinter.
 - DiagnosticInfoUnsupported (not strictly required, but nice to have, duplicated from AMDGPU/BPF's ISelLowering).
 - Some SP and register setup in WebAssemblyTargetLowering.
 - Basic LowerFormalArguments.
 - GenInstrInfo.
 - Placeholder LowerFormalArguments.
 - Placeholder CanLowerReturn and LowerReturn.
 - Basic DAGToDAGISel::Select, which requiresGenDAGISel.inc as well as GET_INSTRINFO_ENUM with GenInstrInfo.inc.
 - Remove WebAssemblyFrameLowering::determineCalleeSaves and rely on default.
 - Implement WebAssemblyFrameLowering::hasFP, same as AArch64's implementation.

Follow-up patches will implement a real AsmPrinter, which will require adding MI opcodes specific to WebAssembly.

Reviewers: sunfish

Subscribers: aemerson, jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D11369

llvm-svn: 242939
2015-07-22 21:28:15 +00:00
Alex Lorenz 46d760d161 MIR Serialization: Serialize the machine instruction's debug location.
llvm-svn: 242938
2015-07-22 21:15:11 +00:00
Yaron Keren 2873810c6f Rename RunCallBacksToRun to llvm::sys::RunSignalHandlers
And expose it in Signals.h, allowing clients to call it directly,
possibly LLVMErrorHandler which currently calls RunInterruptHandlers
but not RunSignalHandlers, thus for example not printing the stack
backtrace on Unixish OSes. On Windows it does happen because
RunInterruptHandlers ends up calling the callbacks as well via 
Cleanup(). This difference in behaviour and code structures in
*/Signals.inc should be patched in the future.

llvm-svn: 242936
2015-07-22 21:11:17 +00:00
Anthony Pesch 3da0acdcbc Improve merging of stores from static constructors in GlobalOpt
Summary:
While working on a project I wound up generating a fairly large lookup table (10k entries) of callbacks inside of a static constructor. Clang was taking upwards of ~10 minutes to compile the lookup table. I generated a smaller test case (http://www.inolen.com/static_initializer_test.ll) that, after running with -ftime-report, pointed fingers at GlobalOpt and MemCpyOptimizer.

Running globalopt took around ~9 minutes. The slowdown came from how GlobalOpt merged stores from static constructors individually into the global initializer in EvaluateStaticConstructor. For each store it discovered and wanted to commit, it would copy the existing global initializer and then merge in the individual store. I changed this so that stores are now grouped by global, and sorted from most significant to least significant by their GEP indexes (e.g. a store to GEP 0, 0 comes before GEP 0, 0, 1). With this representation, the existing initializer can be copied and all new stores merged into it in a single pass.

With this patch and http://reviews.llvm.org/D11198, the lookup table that was taking ~10 minutes to compile now compiles in around 5 seconds. I've ran 'make check' and the test-suite, which all passed.

I'm not really sure who to tag as a reviewer, Lang mentioned that Chandler may be appropriate.

Reviewers: chandlerc, nlewycky

Subscribers: nlewycky, llvm-commits

Differential Revision: http://reviews.llvm.org/D11200

llvm-svn: 242935
2015-07-22 21:10:45 +00:00
Alex Lorenz 44f29259d0 MIR Parser: Extract the MDNode parsing code into a separate method. NFC.
This change would allow the machine instruction parser to reuse this method when
parsing the metadata node for the machine instruction's debug location property.

llvm-svn: 242934
2015-07-22 21:07:04 +00:00
Hans Wennborg 13958b739e Fix -Wextra-semi warnings.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D11400

llvm-svn: 242930
2015-07-22 20:46:11 +00:00
Rafael Espindola be9ab2682e Fix fetching the symbol table of a thin archive.
We were trying to read it as an external file.

llvm-svn: 242926
2015-07-22 19:34:26 +00:00
Yaron Keren 240bd9c875 De-duplicate Unix & Windows CallBacksToRun
Move CallBacksToRun into the common Signals.cpp, create RunCallBacksToRun()
and use these in both Unix/Signals.inc and Windows/Signals.inc.

Lots of potential code to be merged here.

llvm-svn: 242925
2015-07-22 19:01:14 +00:00
Anthony Pesch a2d9369ef3 Test commit, added blank line
llvm-svn: 242923
2015-07-22 18:50:10 +00:00
Chad Rosier 1bf48a6a69 Simplify switch as all cases other than default return true. NFC.
llvm-svn: 242922
2015-07-22 18:41:57 +00:00
Rafael Espindola 69ef2afaeb Identify thin archives as archives.
llvm-svn: 242921
2015-07-22 18:29:39 +00:00
Yaron Keren c2bcf1549b Remove C++98 workaround in llvm::sys::DontRemoveFileOnSignal()
llvm-svn: 242920
2015-07-22 18:23:51 +00:00
Alex Lorenz 35e4446903 MIR Serialization: Serialize the metadata machine operands.
llvm-svn: 242916
2015-07-22 17:58:46 +00:00
Quentin Colombet 48b772007f [ARM] Make the frame lowering code ready for shrink-wrapping.
Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap.

Related to <rdar://problem/20821730>

llvm-svn: 242908
2015-07-22 16:34:37 +00:00
Asaf Badouh a5b2e5e2a7 [X86][AVX512] add reduce/range/scalef/rndScale
include encoding and intrinsics

Differential Revision: http://reviews.llvm.org/D11222

llvm-svn: 242896
2015-07-22 12:00:43 +00:00
Chandler Carruth e9ea5a66f2 [GMR] Add a flag to enable GlobalsModRef in the normal compilation
pipeline.

Even before I started improving its runtime, it was already crazy fast
once the call graph exists, and if we can get it to be conservatively
correct, will still likely catch a lot of interesting and useful cases.
So it may well be useful to enable by default.

But more importantly for me, this should make it easier for me to test
that changes aren't breaking it in fundamental ways by enabling it for
normal builds.

llvm-svn: 242895
2015-07-22 11:57:28 +00:00
Chandler Carruth f3af4af6b5 [GMR] Switch from std::set to SmallPtrSet. NFC.
This almost certainly doesn't matter in some deep sense, but std::set is
essentially always going to be slower here. Now the alias query should
be essentially constant time instead of having to chase the set tree
each time.

llvm-svn: 242893
2015-07-22 11:47:54 +00:00
Chandler Carruth 56e2c62a8d [GMR] Only look in the associated allocs map for an underlying value if
it wasn't one of the indirect globals (which clearly cannot be an
allocation function call). Also only do a single lookup into this map
instead of two. NFC.

llvm-svn: 242892
2015-07-22 11:43:24 +00:00
Chandler Carruth 6919267d0b [GMR] Switch to a DenseMap and clean up the iteration loop. NFC.
Since we have to iterate this map not that infrequently, we should use
a map that is efficient for iteration. It is also almost certainly much
faster for lookups as well. There is more to do in terms of reducing the
wasted overhead of GMR's runtime though. Not sure how much is worthwhile
though.

The loop improvements should hopefully address the code review that
Duncan gave when he saw this code as I moved it around.

llvm-svn: 242891
2015-07-22 11:36:09 +00:00
Chandler Carruth 6fb9c087cb Fix a -Winconsistent-missing-override failure in the .intel_syntax
patch.

llvm-svn: 242890
2015-07-22 11:22:29 +00:00
Chandler Carruth 8f1b63e973 [PM/AA] Try to fix libc++ build bots which require the type used in
std::list to be complete by hoisting the entire definition into the
class. Ugly, but hopefully works.

llvm-svn: 242888
2015-07-22 11:10:41 +00:00
Michael Kuperstein 23d952b611 [X86] Add .intel_syntax noprefix directive to intel-syntax x86 asm output
Patch by: michael.zuckerman@intel.com
Differential Revision: http://reviews.llvm.org/D11223

llvm-svn: 242886
2015-07-22 10:49:44 +00:00
Michael Kuperstein d72403636c Fix mem2reg to correctly handle allocas only used in a single block
Currently, a load from an alloca that is used in as single block and is not preceded
by a store is replaced by undef. This is not always correct if the single block is
inside a loop.
Fix the logic so that:
1) If there are no stores in the block, replace the load with an undef, as before.
2) If there is a store (regardless of where it is in the block w.r.t the load), bail
out, and let the rest of mem2reg handle this alloca.

Patch by: gil.rapaport@intel.com
Differential Revision: http://reviews.llvm.org/D11355

llvm-svn: 242884
2015-07-22 10:29:29 +00:00
Kuba Brecka 8ec94ead7d [asan] Improve moving of non-instrumented allocas
In r242510, non-instrumented allocas are now moved into the first basic block.  This patch limits that to only move allocas that are present *after* the first instrumented one (i.e. only move allocas up).  A testcase was updated to show behavior in these two cases.  Without the patch, an alloca could be moved down, and could cause an invalid IR.

Differential Revision: http://reviews.llvm.org/D11339

llvm-svn: 242883
2015-07-22 10:25:38 +00:00
Chandler Carruth 96ada25bf3 [PM/AA] Remove all of the dead AliasAnalysis pointers being threaded
through APIs that are no longer necessary now that the update API has
been removed.

This will make changes to the AA interfaces significantly less
disruptive (I hope). Either way, it seems like a really nice cleanup.

llvm-svn: 242882
2015-07-22 09:52:54 +00:00
Chandler Carruth a1032a0f7c [PM/AA] Remove the last of the legacy update API from AliasAnalysis as
part of simplifying its interface and usage in preparation for porting
to work with the new pass manager.

Note that this will likely expose that we have dead arguments, members,
and maybe even pass requirements for AA. I'll be cleaning those up in
seperate patches. This just zaps the actual update API.

Differential Revision: http://reviews.llvm.org/D11325

llvm-svn: 242881
2015-07-22 09:49:59 +00:00
Chandler Carruth d86a4f5ec8 [PM/AA] Switch to an early-exit. NFC. This was split out of another
change because the diff is *useless*. I assure you, I just switched to
early-return in this function.

Cleanup in preparation for my next commit, as requested in code review!

llvm-svn: 242880
2015-07-22 09:44:54 +00:00
Chandler Carruth 1ffd12e374 [PM/AA] Put the 'final' keyword in the correct place. And actually
succeed at compiling my change before committing it too!

llvm-svn: 242879
2015-07-22 09:34:18 +00:00
Chandler Carruth da7c1919f7 [PM/AA] Replace the only use of the AliasAnalysis::deleteValue API (in
GlobalsModRef) with CallbackVHs that trigger the same behavior.

This is technically more expensive, but in benchmarking some LTO runs,
it seems unlikely to even be above the noise floor. The only way I was
able to measure the performance of GMR at all was to run nothing else
but this one analysis on a linked clang bitcode file. The call graph
analysis still took 5x more time than GMR, and this change at most made
GMR 2% slower (this is well within the noise, so its hard for me to be
sure that this is an actual change). However, in a real LTO run over the
same bitcode, the GMR run takes so little time that the pass timers
don't measure it.

With this, I can remove the last update API from the AliasAnalysis
interface, but I'll actually remove the interface hook point in
a follow-up commit.

Differential Revision: http://reviews.llvm.org/D11324

llvm-svn: 242878
2015-07-22 09:27:58 +00:00
Elena Demikhovsky a26f10ce18 AVX-512: Added intrinsics for VCVT* instructions.
All SKX forms. All VCVT instructions for float/double/int/long types.

Differential Revision: http://reviews.llvm.org/D11343

llvm-svn: 242877
2015-07-22 08:56:00 +00:00
Chen Li c0f3a158f0 [LoopUnswitch] Code refactoring to separate trivial loop unswitch and non-trivial loop unswitch in processCurrentLoop()
Summary: The current code in LoopUnswtich::processCurrentLoop() mixes trivial loop unswitch and non-trivial loop unswitch together. It goes over all basic blocks in the loop and checks if a condition is trivial or non-trivial unswitch condition. However, trivial unswitch condition can only occur in the loop header basic block (where it controls whether or not the loop does something at all). This refactoring separate trivial loop unswitch and non-trivial loop unswitch. Before going over all basic blocks in the loop, it checks if the loop header contains a trivial unswitch condition. If so, unswitch it. Otherwise, go over all blocks like before but don't check trivial condition any more since they are not possible to be in the other blocks. This code has no functionality change.

Reviewers: meheff, reames, broune

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11276

llvm-svn: 242873
2015-07-22 05:26:29 +00:00
Jingyue Wu 20d73c6cc0 [BranchFolding] do not iterate the aliases of virtual registers
Summary:
MCRegAliasIterator only works for physical registers. So, do not run it
on virtual registers.

With this issue fixed, we can resurrect the BranchFolding pass in NVPTX
backend.

Reviewers: jholewinski, bkramer

Subscribers: henryhu, meheff, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11174

llvm-svn: 242871
2015-07-22 04:16:52 +00:00
Chandler Carruth ccffdaf7ed [SROA] Fix a nasty pile of bugs to do with big-endian, different alloca
types and loads, loads or stores widened past the size of an alloca,
etc.

This started off with a bug report about big-endian behavior with
bitfields and loads and stores to a { i32, i24 } struct. An initial
attempt to fix this was sent for review in D10357, but that didn't
really get to the root of the problem.

The core issue was that canConvertValue and convertValue in SROA were
handling different bitwidth integers by doing a zext of the integer. It
wouldn't do a trunc though, only a zext! This would in turn lead SROA to
form an i24 load from an i24 alloca, zext it to i32, and then use it.
This would at least produce the wrong value for big-endian systems.

One of my many false starts here was to correct the computation for
big-endian systems by shifting. But this doesn't actually work because
the original code has a 64-bit store to the entire 8 bytes, and a 32-bit
load of the last 4 bytes, and because the alloc size is 8 bytes, we
can't lose that last (least significant if bigendian) byte! The real
problem here is that we're forming an i24 load in SROA which is actually
not sufficiently wide to load all of the necessary bits here. The source
has an i32 load, and SROA needs to form that as well.

The straightforward way to do this is to disable the zext logic in
canConvertValue and convertValue, forcing us to actually load all
32-bits. This seems like a really good change, but it in turn breaks
several other parts of SROA.

First in the chain of knock-on failures, we had places where we were
doing integer-widening promotion even though some of the integer loads
or stores extended *past the end* of the alloca's memory! There was even
a comment about preventing this, but it only prevented the case where
the type had a different bit size from its store size. So I added checks
to handle the cases where we actually have a widened load or store and
to avoid trying to special integer widening promotion in those cases.

Second, we actually rely on the ability to promote in the face of loads
past the end of an alloca! This is important so that we can (for
example) speculate loads around PHI nodes to do more promotion. The bits
loaded are garbage, but as long as they aren't used and the alignment is
suitable high (which it wasn't in the test case!) this is "fine". And we
can't stop promoting here, lots of things stop working well if we do. So
we need to add specific logic to handle the extension (and truncation)
case, but *only* where that extension or truncation are over bytes that
*are outside the alloca's allocated storage* and thus totally bogus to
load or store.

And of course, once we add back this correct handling of extension or
truncation, we need to correctly handle bigendian systems to avoid
re-introducing the exact bug that started us off on this chain of misery
in the first place, but this time even more subtle as it only happens
along speculated loads atop a PHI node.

I've ported an existing test for PHI speculation to the big-endian test
file and checked that we get that part correct, and I've added several
more interesting big-endian test cases that should help check that we're
getting this correct.

Fun times.

llvm-svn: 242869
2015-07-22 03:32:42 +00:00
Alexey Samsonov 4800c2de28 [Fuzzer] Rely on $PATH expansion instead of hardcoding paths in tests. NFC.
llvm-svn: 242851
2015-07-21 22:51:55 +00:00
Alexey Samsonov dc324e1644 [Fuzzer] Clearly separate regular and DFSan tests. NFC.
llvm-svn: 242850
2015-07-21 22:51:49 +00:00
Alex Lorenz f4baeb51b2 MIR Serialization: Start serializing the CFI operands with .cfi_def_cfa_offset.
This commit begins serialization of the CFI index machine operands by
serializing one kind of CFI instruction - the .cfi_def_cfa_offset instruction.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 242845
2015-07-21 22:28:27 +00:00
Nick Lewycky f836c89c49 Fix a performance problem in memcpyopt by removing a linear scan over ranges when inserting a new range. No functionality change intended. Patch by Anthony Pesch!
llvm-svn: 242843
2015-07-21 21:56:26 +00:00
Jingyue Wu d058ea927f [MDA] change BlockScanLimit into a command line option.
Summary:
In the benchmark (https://github.com/vetter/shoc) we are researching,
the duplicated load is not eliminated because MemoryDependenceAnalysis
hit the BlockScanLimit. This patch change it into a command line option
instead of a hardcoded value.

Patched by Xuetian Weng. 

Test Plan: test/Analysis/MemoryDependenceAnalysis/memdep-block-scan-limit.ll

Reviewers: jingyue, reames

Subscribers: reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D11366

llvm-svn: 242842
2015-07-21 21:50:39 +00:00