Commit Graph

56107 Commits

Author SHA1 Message Date
Vitaly Buka 84d912b7d2 [ThinLTO] Write TYPE_IDs for types used in functions imported by aliases
Summary:
ThinLTO imports alias as a copy of a aliasee, so when we import such functions with type tests we will
need type ids used by function. However after D49565 we pick types only during processing of
FunctionSummary which is not happening for such aliesees.

Example:
Unit U1 with a type, a functions F with the type check, and an alias A to the function.
Unit U2 with only call to the alias A.

In particular, this happens when we use -mconstructor-aliases, which is default.
So if c++ unit only creates instance of the class, without calling any other methods it will lack of
necessary type ids, which will result in false CFI reports.

Reviewers: tejohnson, eugenis

Subscribers: pcc, mehdi_amini, inglorion, eraman, hiraditya, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D52201

llvm-svn: 342574
2018-09-19 18:51:42 +00:00
Simon Atanasyan a9e8765e3e [mips][microMIPS] Extending size reduction pass with MOVEP
The patch extends size reduction pass for MicroMIPS. Two MOVE
instructions are transformed into one MOVEP instrucition.

Patch by Milena Vujosevic Janicic.

Differential revision: https://reviews.llvm.org/D52037

llvm-svn: 342572
2018-09-19 18:46:29 +00:00
Simon Pilgrim 8191d63c3b [X86] Add initial SimplifyDemandedVectorEltsForTargetNode support
This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles.

Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity.

Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc.

NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification.

Differential Revision: https://reviews.llvm.org/D52140

llvm-svn: 342564
2018-09-19 18:11:34 +00:00
Carl Ritson 6b8d75425e [AMDGPU] Add instruction selection for i1 to f16 conversion
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52018

Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851
llvm-svn: 342558
2018-09-19 16:32:12 +00:00
Andrea Di Biagio 8b6c314be1 [TableGen][SubtargetEmitter] Add the ability for processor models to describe dependency breaking instructions.
This patch adds the ability for processor models to describe dependency breaking
instructions.

Different processors may specify a different set of dependency-breaking
instructions.
That means, we cannot assume that all processors of the same target would use
the same rules to classify dependency breaking instructions.

The main goal of this patch is to provide the means to describe dependency
breaking instructions directly via tablegen, and have the following
TargetSubtargetInfo hooks redefined in overrides by tabegen'd
XXXGenSubtargetInfo classes (here, XXX is a Target name).

```
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
  return false;
}

virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
  return isZeroIdiom(MI);
}
```

An instruction MI is a dependency-breaking instruction if a call to method
isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to
true. Similarly, an instruction MI is a special case of zero-idiom dependency
breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine
operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of
dependency-breaking. In the absence of external information, those method calls
would always return false.

A new tablegen class named STIPredicate has been added by this patch to let
processor models classify instructions that have properties in common. The idea
is that, a MCInstrPredicate definition can be used to "generate" an instruction
equivalence class, with the idea that instructions of a same class all have a
property in common.

STIPredicate definitions are essentially a collection of instruction equivalence
classes.
Also, different processor models can specify a different variant of the same
STIPredicate with different rules (i.e. predicates) to classify instructions.
Tablegen backends (in this particular case, the SubtargetEmitter) will be able
to process STIPredicate definitions, and automatically generate functions in
XXXGenSubtargetInfo.

This patch introduces two special kind of STIPredicate classes named
IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a
definition for those in the BtVer2 scheduling model only.

This patch supersedes the one committed at r338372 (phabricator review: D49310).

The main advantages are:
 - We can describe subtarget predicates via tablegen using STIPredicates.
 - We can describe zero-idioms / dep-breaking instructions directly via
   tablegen in the scheduling models.

In future, the STIPredicates framework can be used for solving other problems.
Examples of future developments are:
 - Teach how to identify optimizable register-register moves
 - Teach how to identify slow LEA instructions (each subtarget defining its own
   concept of "slow" LEA).
 - Teach how to identify instructions that have undocumented false dependencies
   on the output registers on some processors only.

It is also (in my opinion) an elegant way to expose knowledge to both external
tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when
internally constructing the data dependency graph for a code region.

This new design feature is also an "opt-in" feature. Processor models don't have
to use the new STIPredicates. It has all been designed to be as unintrusive as
possible.

Differential Revision: https://reviews.llvm.org/D52174

llvm-svn: 342555
2018-09-19 15:57:45 +00:00
Sanjay Patel 4fd2e2a498 [DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add
This is an alternative to D37896. I don't see a way to decompose multiplies 
generically without a target hook to tell us when it's profitable. 

ARM and AArch64 may be able to remove some duplicate code that overlaps with 
this transform.

As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474

As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.

Differential Revision: https://reviews.llvm.org/D52195

llvm-svn: 342554
2018-09-19 15:57:40 +00:00
Fedor Sergeev 25de3f83be Revert rL342544: [New PM] Introducing PassInstrumentation framework
A bunch of bots fail to compile unittests. Reverting.

llvm-svn: 342552
2018-09-19 14:54:48 +00:00
Roman Lebedev f50023d37c [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask
Summary:
The last low-bit-mask-pattern-producing-pattern i can think of.

https://rise4fun.com/Alive/UGzE <- non-canonical
But we can not canonicalize it because of extra uses.

https://bugs.llvm.org/show_bug.cgi?id=38123

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52148

llvm-svn: 342548
2018-09-19 13:35:46 +00:00
Roman Lebedev ca2bdb03d6 [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask
Summary:
Same as to D52146.
`((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl
We can not canonicalize it due to the extra uses. But we can handle it here.

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52147

llvm-svn: 342547
2018-09-19 13:35:40 +00:00
Roman Lebedev 183a465dc6 [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask
Summary:
Two folds are happening here:
1. https://rise4fun.com/Alive/oaFX
2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4

This change doesn't just add the handling for eq/ne predicates,
it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work,
so **all** the 16 fold variants* are immediately supported.

I'm indeed only testing these two predicates.
I do not feel like re-proving all 16 folds*, because they were already proven
for the general case of constant with all-ones in low bits. So as long as
the mask produces all-ones in low bits, i'm pretty sure the fold is valid.

But required, i can re-prove, let me know.

* eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52146

llvm-svn: 342546
2018-09-19 13:35:27 +00:00
Oliver Stannard 0b835be7bb [ARM] Fix unwind information for floating point registers
Fixes the unwind information generated for floating-point registers.
Previously, all padding registers were assumed to be four bytes wide. Now, the
width of the register is used to specify the amount of padding.

Patch by Jackson Woodruff!

Differential revision: https://reviews.llvm.org/D51494

llvm-svn: 342545
2018-09-19 13:25:31 +00:00
Fedor Sergeev 875c938fec [New PM] Introducing PassInstrumentation framework
Summary:
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@

The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.

Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
  and access to them.

* PassInstrumentation class that handles instrumentation-point interfaces
  that call into PassInstrumentationCallbacks.

* Callbacks accept StringRef which is just a name of the Pass right now.
  There were some ideas to pass an opaque wrapper for the pointer to pass instance,
  however it appears that pointer does not actually identify the instance
  (adaptors and managers might have the same address with the pass they govern).
  Hence it was decided to go simple for now and then later decide on what the proper
  mental model of identifying a "pass in a phase of pipeline" is.

* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
  on different IRUnits (e.g. Analyses).

* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
  usual AnalysisManager::getResult. All pass managers were updated to run that
  to get PassInstrumentation object for instrumentation calls.

* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
  args out of a generic PassManager's extra args. This is the only way I was able to explicitly
  run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
  RepeatedPass::run.
  TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
  and then get rid of getAnalysisResult by improving RepeatedPass implementation.

* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
  PassInstrumentationAnalysis. Callbacks registration should be performed directly
  through PassInstrumentationCallbacks.

* new-pm tests updated to account for PassInstrumentationAnalysis being run

* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
  Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.

Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858

llvm-svn: 342544
2018-09-19 12:25:52 +00:00
Benjamin Kramer e5e1ea79fd [InstCombine] Don't transform sin/cos -> tanl if for half types
This is still unsafe for long double, we will transform things into tanl
even if tanl is for another type. But that's for someone else to fix.

llvm-svn: 342542
2018-09-19 12:01:38 +00:00
Alex Bradbury 21aea51e71 [RISCV] Codegen for i8, i16, and i32 atomicrmw with RV32A
Introduce a new RISCVExpandPseudoInsts pass to expand atomic 
pseudo-instructions after register allocation. This is necessary in order to 
ensure that register spills aren't introduced between LL and SC, thus breaking 
the forward progress guarantee for the operation. AArch64 does something 
similar for CmpXchg (though only at O0), and Mips is moving towards this 
approach (see D31287). See also [this mailing list 
post](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099490.html) from 
James Knight, which summarises the issues with lowering to ll/sc in IR or 
pre-RA.

See the [accompanying RFC 
thread](http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html) for an 
overview of the lowering strategy.

Differential Revision: https://reviews.llvm.org/D47882

llvm-svn: 342534
2018-09-19 10:54:22 +00:00
Hans Wennborg 4195eb1068 [COFF] Emit @feat.00 on 64-bit and set the CFG bit when emitting guardcf tables
The 0x800 bit in @feat.00 needs to be set in order to make LLD pick up
the .gfid$y table. I believe this is fine to set even if we don't emit
the instrumentation.

We haven't emitted @feat.00 on 64-bit before. I see that MSVC does emit
it, but I'm not entirely sure what the default value should be. I went
with zero since that seems as safe as not emitting the symbol in the
first place.

Differential Revision: https://reviews.llvm.org/D52235

llvm-svn: 342532
2018-09-19 09:58:30 +00:00
Simon Pilgrim e2b16389e7 [X86][SSE] Update extractelement test in preparation for D52140
SimplifyDemandedVectorEltsForTargetNode will remove most of this test unless get rid of the undefs - still testing for align 1 which was the point of the test

Removed out of date comment as well

llvm-svn: 342531
2018-09-19 09:50:32 +00:00
Carlos Alberto Enciso ba4e437c6a [DebugInfo][Dexter] Speculated BB presents illegal variable value to debugger.
When SimplifyCFG changes the PHI node into a select instruction, the debug information becomes ambiguous. It causes the debugger to display wrong variable value. 

Differential Revision: https://reviews.llvm.org/D51976

llvm-svn: 342527
2018-09-19 08:16:56 +00:00
Thomas Lively aaf4e2cbba [WebAssembly] v4f32.abs and v2f64.abs
Summary: implement lowering of @llvm.fabs for vector types.

Reviewers: aheejin, dschuff

Subscribers:

llvm-svn: 342513
2018-09-18 21:45:12 +00:00
Douglas Yung de94ea140c Revert r342494 as it was failing on a bot and the author cannot look at it until tomorrow.
Failing bot: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/36708

llvm-svn: 342509
2018-09-18 19:34:05 +00:00
Christy Lee c85da8bd9a Do not optimize atomic load to non-atomic memcmp
Differential Revision: https://reviews.llvm.org/D51998

llvm-svn: 342498
2018-09-18 17:02:42 +00:00
Farhana Aleen f5a2848376 [AMDGPU] Match udot8 pattern
Summary: D.u32 = S0.u4[0] * S1.u4[0] +

         S0.u4[1] * S1.u4[1] +
         S0.u4[2] * S1.u4[2] +
         S0.u4[3] * S1.u4[3] +
         S0.u4[4] * S1.u4[4] +
         S0.u4[5] * S1.u4[5] +
         S0.u4[6] * S1.u4[6] +
         S0.u4[7] * S1.u4[7] +
         S2.u32

Author: FarhanaAleen

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D51947

llvm-svn: 342497
2018-09-18 16:59:48 +00:00
Christy Lee 1bc0fb4a3f Check lines before using alias analysis to check for interference
This diff is to show the difference before and after D51550

Differential Revision: https://reviews.llvm.org/D52044

llvm-svn: 342494
2018-09-18 16:43:44 +00:00
Zachary Turner c41ce8355f [PDB] Better support for enumerating pointer types.
There were several issues with the previous implementation.

1) There were no tests.
2) We didn't support creating PDBSymbolTypePointer records for
   builtin types since those aren't described by LF_POINTER
   records.
3) We didn't support a wide enough variety of builtin types even
   ignoring pointers.

This patch fixes all of these issues.  In order to add tests,
it's helpful to be able to ignore the symbol index id hierarchy
because it makes the golden output from the DIA version not match
our output, so I've extended the dumper to disable dumping of id
fields.

llvm-svn: 342493
2018-09-18 16:35:05 +00:00
Krzysztof Parzyszek c1e2f39b35 [PostRASink] Make sure to remove subregisters from live-ins as well
llvm-svn: 342492
2018-09-18 16:10:51 +00:00
Alex Bradbury 7d0e18d0dd [RISCV][MC] Reject bare symbols for the simm12 operand type
addi a0, a0, foo and lw a0, foo(a0) and similar are now rejected. An explicit 
%lo and %pcrel_lo modifier is required. This matches gas behaviour.

llvm-svn: 342487
2018-09-18 15:13:29 +00:00
Alex Bradbury 74340f1805 [RISCV][MC] Tighten up checking of sybol operands to lui and auipc
Reject bare symbols and accept only %pcrel_hi(sym) for auipc and %hi(sym) for 
lui. Also test valid operand modifiers in rv32i-valid.s.

Note this is slightly stricter than gas, which will accept either %pcrel_hi or 
%hi for both lui and auipc.

Differential Revision: https://reviews.llvm.org/D51731

llvm-svn: 342486
2018-09-18 15:08:35 +00:00
Hans Wennborg 01c3154971 Revert r342457 "Fixes removal of dead elements from PressureDiff (PR37252)."
This broke the lit tests on a bunch of buildbots, e.g.
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/36679

> Reviewed By: MatzeB
>
> Differential Revision: https://reviews.llvm.org/D51495

llvm-svn: 342482
2018-09-18 14:12:54 +00:00
Nemanja Ivanovic 87c31a6113 [PowerPC] Do not emit record-form rotates when record-form andi/andis suffices
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.

This patch increases the number of record-form rotates we eliminate by
more than 70%.

Differential revision: https://reviews.llvm.org/D44897

llvm-svn: 342478
2018-09-18 13:43:16 +00:00
Teresa Johnson 5e1c0e7610 [LTO] Make detection of WPD remark enablement more robust
Summary:
Currently only the first function in the module is checked to
see if it has remarks enabled. If that first function is a declaration,
remarks will be incorrectly skipped. Change to look for the first
non-empty function.

Reviewers: pcc

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51556

llvm-svn: 342477
2018-09-18 13:42:24 +00:00
Nemanja Ivanovic 6a39d32e66 [PowerPC] Optimize compares fed by ANDISo
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.

Differential revision: https://reviews.llvm.org/D51353

llvm-svn: 342472
2018-09-18 13:21:58 +00:00
John Brawn 83d7414e19 [TargetLowering] Android has sincos functions
Since Android API version 9 the Android libm has had the sincos functions, so
they should be recognised as libcalls and sincos optimisation should be applied.

Differential Revision: https://reviews.llvm.org/D52025

llvm-svn: 342471
2018-09-18 13:18:21 +00:00
Alexander Kornienko 4d9d76d800 Remove trailing whitespace introduced in r342440.
llvm-svn: 342463
2018-09-18 10:53:13 +00:00
Yury Gribov 53db663afb Fixes removal of dead elements from PressureDiff (PR37252).
Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D51495

llvm-svn: 342457
2018-09-18 09:53:42 +00:00
David Green 85d6a55995 [AArch64] Attempt to parse more operands as expressions
This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef
to parse more complex expressions as relocatable operands. It is hopefully better than
the existing code which only handles Symbol +- Constant.

This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also
loosens the requirements on parsing addends in ld/st and mov's and adds a number of
tests.

Differential Revision: https://reviews.llvm.org/D51792

llvm-svn: 342455
2018-09-18 09:44:53 +00:00
Max Kazantsev 0994abda3a [IndVars] Remove unreasonable checks in rewriteLoopExitValues
A piece of logic in rewriteLoopExitValues has a weird check on number of
users which allowed an unprofitable transform in case if an instruction has
more than 6 users.

Differential Revision: https://reviews.llvm.org/D51404
Reviewed By: etherzhhb

llvm-svn: 342444
2018-09-18 04:57:18 +00:00
Matt Arsenault ebf46143ea AMDGPU: Don't form fmed3 if it will require materialization
If there is a single use constant, it can be folded into the
min/max, but not into med3.

llvm-svn: 342443
2018-09-18 02:34:54 +00:00
Matt Arsenault c640798597 LSV: Fix adjust alloca alignment trick for AMDGPU
This was checking the hardcoded address space 0 for the stack.
Additionally, this should be checking for legality with
the adjusted alignment, so defer the alignment check.

Also try to split if the unaligned access isn't allowed.

llvm-svn: 342442
2018-09-18 02:05:44 +00:00
QingShan Zhang f1b0b47b2d [PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52040

llvm-svn: 342441
2018-09-18 02:05:18 +00:00
QingShan Zhang df1a2d8d2b [PowerPC][NFC] Add a mulld testcase for scheduling check.
This patch add a mulld testcase for scheduling check.

Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52039

llvm-svn: 342440
2018-09-18 01:59:22 +00:00
Matt Arsenault 9d49c449ec AMDGPU: Expand vector canonicalizes
llvm-svn: 342439
2018-09-18 01:51:33 +00:00
Volodymyr Sapsai 703ab84cf5 Revert "[ARM] Cleanup ARM CGP isSupportedValue"
This reverts r342395 as it caused error

> Argument value type does not match pointer operand type!
>   %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25
>  i8in function atomic_flag_test_and_set
> fatal error: error in backend: Broken function found, compilation aborted!

on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/

More details are available at https://reviews.llvm.org/D52080

llvm-svn: 342431
2018-09-18 00:11:55 +00:00
Reid Kleckner d38f9a6ce3 Work around grep vs. CRLF issue in Thumb2 test by matching excess whitespace
There seems to be a separate command line tokenization issue that
prevents just ':\s*$' from working, since then the pattern argument
isn't quoted, and grep.exe misinterprets the backslash somehow.

llvm-svn: 342430
2018-09-18 00:04:29 +00:00
Alina Sbirlea a782a70ad9 [EarlyCSEwMemorySSA] Add MSSA verification and tests to make EarlyCSE failures easier to track.
Summary:
EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result.
Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA.
Adding some tests to track this and a basic verify for future potential failures.

Reviewers: george.burgess.iv, gberry

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D51960

llvm-svn: 342422
2018-09-17 22:35:21 +00:00
Simon Atanasyan 9265dca8b5 [mips] Fix MIPS N32 ABI triples support
Add support mips64(el)-linux-gnuabin32 triples, and set them to N32.
Debian architecture name mipsn32/mipsn32el are also added. Set
UseIntegratedAssembler for N32 if we can detect it.

Patch by YunQiang Su.

Differential revision: https://reviews.llvm.org/D51408

llvm-svn: 342416
2018-09-17 21:21:57 +00:00
Zachary Turner bdf0381e21 [PDB] Make the native reader support enumerators.
Previously we would dump the names of enum types, but not their
enumerator values.  This adds support for enumerator values.  In
doing so, we have to introduce a general purpose mechanism for
caching symbol indices of field list members.  Unlike global
types, FieldList members do not have a TypeIndex.  So instead,
we identify them by the pair {TypeIndexOfFieldList, IndexInFieldList}.

llvm-svn: 342415
2018-09-17 21:08:11 +00:00
Zachary Turner 4727ac2394 [PDB] Make the native reader support modified types.
Previously for cv-qualified types, we would just ignore them
and they would never get printed.  Now we can enumerate them
and cache them like any other symbol type.

llvm-svn: 342414
2018-09-17 21:07:48 +00:00
Nirav Dave b1f9e4d285 [MC] Avoid inlining constant symbols with variants.
Summary:
Defer unnecessary early inlining of constants to symbol
variants. Fixes PR38945.

Reviewers: nickdesaulniers, rnk

Subscribers: nemanjai, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D52188

llvm-svn: 342412
2018-09-17 20:34:26 +00:00
Keno Fischer c8ccaed325 [X86ISel] Implement byval lowering for Win64 calling convention
Summary:
The IR reference for the `byval` attribute states:

```
This indicates that the pointer parameter should really be passed by value
to the function. The attribute implies that a hidden copy of the pointee is
made between the caller and the callee, so the callee is unable to modify
the value in the caller. This attribute is only valid on LLVM pointer arguments.
```

However, on Win64, this attribute is unimplemented and the raw pointer is
passed to the callee instead. This is problematic, because frontend authors
relying on the implicit hidden copy (as happens for every other calling
convention) will see the passed value silently (if mutable memory) or
loudly (by means of a crash) modified because the callee treats the
location as scratch memory space it is allowed to mutate.

At this point, it's worth taking a step back to understand the context.
In most calling conventions, aggregates that are too large to be passed
in registers, instead get *copied* to the stack at a fixed (computable
from the signature) offset of the stack pointer. At the LLVM, we hide
this hidden copy behind the byval attribute. The caller passes a pointer
to the desired data and the callee receives a pointer, but these pointers
are not the same. In particular, the pointer that the callee receives
points to temporary stack memory allocated as part of the call lowering.
In most calling conventions, this pointer is never realized in registers
or memory. The temporary memory is simply defined by an implicit
offset from the stack pointer at function entry.

Win64, uniquely, works differently. The structure is still passed in
memory, but instead of being stored at an implicit memory offset, the
caller computes a pointer to the temporary memory and passes it to
the callee as a regular pointer (taking up a register, or if all
registers are taken up, an additional stack slot). Presumably, this
was done to allow eliding the copy when passing aggregates through
several functions on the stack.

This explains why ignoring the `byval` attribute mostly works on Win64.
The argument simply gets passed as a pointer and as long as we're ok
with the callee trampling all over that memory, there are no ill effects.
However, it does contradict the documentation of the `byval` attribute
which specifies that there is to be an implicit copy.

Frontends can of course work around this by never emitting the `byval`
attribute for Win64 and creating `alloca`s for the requisite temporary
stack slots (and that does appear to be what frontends are doing).
However, the presence of the `byval` attribute is not a trap for
frontend authors, since it seems to work, but silently modifies the
passed memory contrary to documentation.

I see two solutions:
- Disallow the `byval` attribute in the verifier if using the Win64
  calling convention.
- Make it work by simply emitting a temporary stack copy as we would
  with any other calling convention (frontends can of course always
  not use the attribute if they want to elide the copy).

This patch implements the second option (make it work), though I would
be fine with the first also.

Ref: https://github.com/JuliaLang/julia/issues/28338

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51842

llvm-svn: 342402
2018-09-17 17:37:14 +00:00
Alexander Kornienko e74e0f11d1 Revert "[DWARF] reposting r342048, which was reverted in r342056 due to buildbot errors. Adjusted 2 test cases for ARM and darwin and fixed a bug with the original change in dsymutil."
This reverts commit r342218. Due to a number of failures under TSAN. An isolated
test case is being worked on.

llvm-svn: 342399
2018-09-17 15:40:01 +00:00
Amara Emerson 91c2913522 Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index.""
Fixed the assertion failure.

llvm-svn: 342397
2018-09-17 14:40:13 +00:00
Sam Parker 481cdab919 [ARM] Cleanup ARM CGP isSupportedValue
isSupportedValue explicitly checked and accepted many types of value,
primarily for debugging reasons. Remove most of these checks and do a
bit of refactoring now that the pass is more stable. This also enables
ZExts to be sources, but this has very little practical benefit at the
moment extend instructions will still be introduced.

Differential Revision: https://reviews.llvm.org/D52080

llvm-svn: 342395
2018-09-17 13:57:39 +00:00
Sam Parker 76d25d7f55 [ARM] Disallow icmp with negative imm and overflow
We allow overflowing instructions if they're decreasing and only used
by an unsigned compare. Add the extra condition that the icmp cannot
be using a negative immediate.

Differential Revision: https://reviews.llvm.org/D52102

llvm-svn: 342392
2018-09-17 13:48:25 +00:00
Matt Arsenault 80ea6dd1d5 Fix vectorization of canonicalize
llvm-svn: 342390
2018-09-17 13:24:30 +00:00
Alexandros Lamprineas 8a1c374b2e [GVNHoist] Re-enable GVNHoist by default
Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912
has been fixed by rL342055.

Precommit testing performed:
* Overnight runs of csmith comparing the output between programs
  compiled with gvn-hoist enabled/disabled.
* Bootstrap builds of clang with UbSan/ASan configurations.

llvm-svn: 342387
2018-09-17 12:24:55 +00:00
Strahinja Petrovic 488fd4e625 [PowerPC] Fix label address calculation for ppc64
This patch fixes calculating address of label for non-pic ppc64.

Differential Revision: https://reviews.llvm.org/D50965

llvm-svn: 342368
2018-09-17 11:03:40 +00:00
James Henderson e29e40854b Reland r342233: [ThinLTO] Allow setting of maximum cache size with 64-bit number
The original was reverted due to an apparent build-bot test failure,
but it looks like this is just a flaky test.

Also added a C-interface function for large values, and updated
llvm-lto's --thinlto-cache-max-size-bytes switch to take a type larger
than int.

The maximum cache size in terms of bytes is a 64-bit number. However,
the methods to set it only took unsigned previously, which meant that
the maximum cache size could not be specified above 4GB. That's quite
small compared to the output of some projects, so it makes sense to
provide the ability to set larger values in that field.

We also needed a C-interface function that provides a greater range
than the existing thinlto_codegen_set_cache_size_bytes, which also only
takes an unsigned, so this change also adds
hinlto_codegen_set_cache_size_megabytes.

Reviewed by: mehdi_amini, tejohnson, steven_wu

Differential Revision: https://reviews.llvm.org/D52023

llvm-svn: 342366
2018-09-17 10:21:26 +00:00
Alexander Shaposhnikov 1de445c71c [llvm-objcopy] Add missing alias for --strip-all-gnu
This diff adds -S as an alias for --strip-all-gnu 
(for compatibility with binutils' objcopy).

Patch by Dmitry Golovin!

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D52163

llvm-svn: 342364
2018-09-17 09:45:12 +00:00
Simon Pilgrim cffa206423 [X86][SSE] Always enable ISD::SRL -> ISD::MULHU for v8i16
For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering.

llvm-svn: 342352
2018-09-16 20:28:38 +00:00
Simon Pilgrim ea069ffd44 [X86][AVX] Enable ISD::SRL -> ISD::MULHU for v16i16
Now that rL340913 has landed with improved v16i16 selects as shuffles.

llvm-svn: 342349
2018-09-16 19:20:47 +00:00
Sanjay Patel 3eaf500a6d [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)
This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040.

Copying the motivational statement by @evandro from that patch:
"This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 
447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. 
Otherwise, no regressions on x86-64 or A64."

I'm proposing to add only the minimum support for a DAG node here. Since we don't have an 
LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I 
don't think we need to worry about DAG builder, legalization, a strict variant, etc. We 
should be able to expand as needed when adding more functionality/transforms. For reference, 
these are transform suggestions currently listed in SimplifyLibCalls.cpp:

//   * cbrt(expN(X))  -> expN(x/3)
//   * cbrt(sqrt(x))  -> pow(x,1/6)
//   * cbrt(cbrt(x))  -> pow(x,1/9)

Also, given that we bail out on long double for now, there should not be any logical 
differences between platforms (unless there's some platform out there that has pow()
but not cbrt()).

Differential Revision: https://reviews.llvm.org/D51753

llvm-svn: 342348
2018-09-16 16:50:26 +00:00
Sanjay Patel bfee5a9b42 [x86] fix uses check in broadcast transform (PR38949)
https://bugs.llvm.org/show_bug.cgi?id=38949

It's not clear to me that we even need a one-use check in this fold.
Ie, 2 independent loads might be better than a load+dependent shuffle.

Note that the existing re-use tests are not affected. We actually do form a
broadcast node in those tests now because there's no extra use of the 
insert_subvector node in those cases. But something later in isel pattern 
matching decides that it is not worth using a broadcast for the full load in 
those tests:

Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:'
  t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
      t4: i64,ch = CopyFromReg t0, Register:i64 %1
    t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
      t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0>
    t20: v4f64 = insert_subvector t18, t7, Constant:i64<2>

Becomes:
  t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
      t4: i64,ch = CopyFromReg t0, Register:i64 %1
    t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
    t21: v4f64 = X86ISD::SUBV_BROADCAST t7

ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7
...
  Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7>
  Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1>

llvm-svn: 342347
2018-09-16 15:41:56 +00:00
Sanjay Patel 3e095174b0 [x86] add failure to splat test (PR38949); NFC
llvm-svn: 342346
2018-09-16 14:59:04 +00:00
Roman Lebedev 6356864e6d [NFC][InstCombine] One more test pattern for comparisons with low-bit-mask.
https://rise4fun.com/Alive/UGzE <- non-canonical, but has extra uses.

https://bugs.llvm.org/show_bug.cgi?id=38123

llvm-svn: 342345
2018-09-16 12:51:09 +00:00
Roman Lebedev 3fb9414d02 [NFC][InstCombine] Some more tests for comparisons with low-bit-mask.
https://bugs.llvm.org/show_bug.cgi?id=38123
https://bugs.llvm.org/show_bug.cgi?id=38708

llvm-svn: 342343
2018-09-16 08:05:06 +00:00
Craig Topper 2da7381678 [InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y))
Summary:
If the sub doesn't overflow in the original type we can move it above the sext/zext.

This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported.

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52075

llvm-svn: 342335
2018-09-15 18:54:10 +00:00
Simon Pilgrim fc4c26485c [X86][SSE] Fix insertps load combine test name
The existing test was called extract_lane_insertps_5123 but it was in fact doing a <6,1,2,3> shuffle. I've fixed the name and added the <5,1,2,3> test case as well.

llvm-svn: 342328
2018-09-15 16:57:04 +00:00
Craig Topper fe0b973fbf [X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64.
Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back?

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52134

llvm-svn: 342327
2018-09-15 16:23:35 +00:00
Craig Topper 273f755da3 [X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))
Summary:
MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load.

Inspired by PR38840.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D52121

llvm-svn: 342326
2018-09-15 16:23:33 +00:00
Sanjay Patel 296d35a5e9 [InstCombine][x86] try harder to convert blendv intrinsic to generic IR (PR38814)
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814

If this works, it's an easier and more powerful solution than adding pattern matching 
for a few special cases in the backend. The potential danger with this transform in IR
is that the condition value can get separated from the select, and the backend might 
not be able to make a blendv out of it again. I don't think that's too likely, but 
I've kept this patch minimal with a 'TODO', so we can test that theory in the wild 
before expanding the transform.

Differential Revision: https://reviews.llvm.org/D52059

llvm-svn: 342324
2018-09-15 14:25:44 +00:00
Simon Pilgrim 7bfe87181d Fix line endings. NFCI.
llvm-svn: 342323
2018-09-15 14:20:53 +00:00
Roman Lebedev 1b7fc87020 [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

The last (as far i know?) pattern, non-canonical due to the extra use.
https://godbolt.org/z/aCMsPk
https://rise4fun.com/Alive/I6f

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52062

llvm-svn: 342321
2018-09-15 12:04:13 +00:00
Vedant Kumar 1b02dad9f2 [CodeGenPrepare] Preserve debug locs in OptimizeExtractBits
CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it
easier for the backend to emit fancy extract-bits instructions (e.g UBFX).

Teach it to preserve debug locations and salvage debug values.

llvm-svn: 342319
2018-09-15 04:08:52 +00:00
Thomas Lively 66f3dc031d [WebAssembly][NFC] Generalize operand numbers in SIMD tests
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52130

llvm-svn: 342303
2018-09-15 01:12:48 +00:00
Thomas Lively f2550e0c44 [WebAssembly] SIMD shifts
Summary:
Implement shifts of vectors by i32. Since LLVM defines shifts as
binary operations between two vectors, this involves pattern matching
on splatted shift operands. For v2i64 shifts any i32 shift operands
have to be zero extended in the input and any i64 shift operands have
to be wrapped in the output. Depends on D52007.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D51906

llvm-svn: 342302
2018-09-15 00:45:31 +00:00
Thomas Lively 88b7443f94 [WebAssembly] SIMD neg
Summary: Depends on D52007.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52009

llvm-svn: 342296
2018-09-14 22:35:12 +00:00
Zachary Turner a98ee586bf [PDB] Make the pretty dumper output modified types.
Currently if we got something like `const Foo` we'd ignore it and
just rely on printing the unmodified `Foo` later on.  However,
for testing the native reading code we really would like to be able
to see these so that we can verify that the native reader can
actually handle them.  Instead of printing out the full type though,
just print out the header.

llvm-svn: 342295
2018-09-14 22:29:19 +00:00
Sanjay Patel 90a36346bc [InstCombine] refactor mul narrowing folds; NFCI
Similar to rL342278:
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.

D52075 should be able to use this code too rather than
duplicating all of the logic.

llvm-svn: 342292
2018-09-14 22:23:35 +00:00
Lion Yang c68f78d5d8 [PowerPC] Fix the calling convention for i1 arguments on PPC32
Summary:
Integer types smaller than i32 must be extended to i32 by default.
The feature "crbits" introduced at r202451 handles i1 as a special case,
but it did not extend properly.
The caller was, therefore, passing i1 stack arguments by writing 0/1 to
the first byte of the 4-byte stack object and callee was
reading the first byte for the value.

"crbits" is enabled if the optimization level is greater than 1,
which is very common in "release builds".
Such discrepancies with ABI specification also introduces
potential incompatibility with programs or libraries
built with other compilers e.g. GCC.

Fixes PR38661

Reviewers: hfinkel, cuviper

Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D51108

llvm-svn: 342288
2018-09-14 21:26:05 +00:00
Thomas Lively a3937b231d [WebAssembly][NFC] Move SIMD encoding tests to dedicated file
Summary:
This change makes the tests more focused and avoids problematic
interactions between the testing modes and instruction encoding. This
change also allows the other tests to use less verbose output and
stricter checks.

Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52007

llvm-svn: 342287
2018-09-14 21:21:42 +00:00
Wei Mi 6a14325dff [SampleFDO] Add FunctionOffsetTable in compact binary format profile.
The patch saves a function offset table which maps function name index to the
offset of its function profile to the start of the binary profile. By using
the function offset table, for those function profiles which will not be used
when compiling a module, the profile reader does't have to read them. For
profile size around 10~20M, it saves ~10% compile time.

Differential Revision: https://reviews.llvm.org/D51863

llvm-svn: 342283
2018-09-14 20:52:59 +00:00
Fangrui Song d39d374e15 test/Other/can-execute.txt: delete %t after the test
This test constructs a non-readable file of mode 0111, which lingers in the test output directory and will cause EACCES to various tools (rg, rsync, ...)

llvm-svn: 342279
2018-09-14 20:41:42 +00:00
Sanjay Patel 2426eb46dd [InstCombine] refactor add narrowing folds; NFCI
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.

llvm-svn: 342278
2018-09-14 20:40:46 +00:00
Sebastian Pop 0f30f08b02 HotColdSplit: fix invalid SSA due to outlining
The test used to fail with an invalid phi node: the two predecessors were outlined
and the SSA representation was left invalid. The patch adds the exit block to the
cold region.

llvm-svn: 342277
2018-09-14 20:36:19 +00:00
Sanjay Patel fcf8c7c908 [InstCombine] add more tests for add narrowing folds; NFC
llvm-svn: 342274
2018-09-14 20:33:40 +00:00
Thomas Lively e1f67a8bf7 [WebAssembly][NFC] Fix unconventional test names
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52110

llvm-svn: 342273
2018-09-14 20:22:45 +00:00
Reid Kleckner 4d1b75c6b7 Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index."
Causes 'isVector() && "Invalid vector type!"' assertion when building
Skia in Chrome.

llvm-svn: 342265
2018-09-14 19:39:40 +00:00
Adrian Prantl 16f58d1850 Fix debug info for SelectionDAG legalization of DAG nodes with two results.
This patch fixes the debug info handling for SelectionDAG legalization
of DAG nodes with two results. When an replaced SDNode has more than
one result, transferDbgValues was always copying the SDDbgValue from
the first result and attaching them to all members. In reality
SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes
(though the type signature doesn't make this obvious (cf. the call
site code in ReplaceNode()).

rdar://problem/44162227

Differential Revision: https://reviews.llvm.org/D52112

llvm-svn: 342264
2018-09-14 19:38:45 +00:00
Reid Kleckner 00f0ee718f Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"
It causes assertion failures while building Skia for Android in
Chromium:
https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550

Reduction forthcoming.

llvm-svn: 342260
2018-09-14 18:44:37 +00:00
Simon Pilgrim 4c30f3d4e6 Revert a line-endings change that somehow got included with rL342257
llvm-svn: 342258
2018-09-14 18:35:21 +00:00
Simon Pilgrim 32857c54d2 [X86][SSE] Lower shuffles to permute(unpack(x,y)) (PR31151)
Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation.

As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work.

Differential Revision: https://reviews.llvm.org/D52043

llvm-svn: 342257
2018-09-14 18:33:31 +00:00
Craig Topper ac356cac0c [X86] Re-generate test checks using current version of the script. NFC
The regular expression used for stack accesses is different today.

llvm-svn: 342256
2018-09-14 18:27:09 +00:00
Sanjay Patel 003f452522 [InstCombine] rename test file to better describe the fold; NFC
The folds are not limited to zext, and the real goal is width
reduction of a math op. D52075 is proposing to extend this to
subtracts.

llvm-svn: 342254
2018-09-14 18:12:30 +00:00
Sanjay Patel 5a9462e42a [InstCombine] remove unnecessary target constraints for tests; NFC
These are universal folds.

llvm-svn: 342253
2018-09-14 18:06:36 +00:00
Sanjay Patel 7b9e1afd1f [InstCombine] move test next to related tests; NFC
llvm-svn: 342251
2018-09-14 18:05:14 +00:00
Sanjay Patel 1f4f26a2bb [InstCombine] remove stall comment from test file; NFC
llvm-svn: 342250
2018-09-14 18:02:17 +00:00
Sanjay Patel f7ba0ac0b5 [InstCombine] regenerate test checks; NFC
There was a bug in a check line regex that could cause the test to fail
with a naming difference. The auto-gen script seems to work as expected now.

llvm-svn: 342249
2018-09-14 17:53:44 +00:00
James Henderson 13f426304f Revert r342233.
This caused LLD test failures, which I've been unable to reproduce.

Reverting to allow for further investigation next week.

llvm-svn: 342244
2018-09-14 16:48:47 +00:00
Sanjay Patel b437238e95 [InstCombine] add more tests for x86 blendv (PR38814); NFC
llvm-svn: 342237
2018-09-14 13:47:33 +00:00
Simon Pilgrim 1c1335a10d [X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2
These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64.

llvm-svn: 342235
2018-09-14 13:31:14 +00:00
James Henderson 48c0688a36 [ThinLTO]Allow setting of maximum cache size with 64-bit number
Also added a C-interface function for large values, and updated
llvm-lto's --thinlto-cache-max-size-bytes switch to take a type larger
than int.

The maximum cache size in terms of bytes is a 64-bit number. However,
the methods to set it only took unsigned previously, which meant that
the maximum cache size could not be specified above 4GB. That's quite
small compared to the output of some projects, so it makes sense to
provide the ability to set larger values in that field.

We also needed a C-interface function that provides a greater range
than the existing thinlto_codegen_set_cache_size_bytes, which also only
takes an unsigned, so this change also adds
hinlto_codegen_set_cache_size_megabytes.

Reviewed by: mehdi_amini, tejohnson, steven_wu

Differential Revision: https://reviews.llvm.org/D52023

llvm-svn: 342233
2018-09-14 12:51:19 +00:00