Commit Graph

24263 Commits

Author SHA1 Message Date
Steven Wu d0804aa6dc [MachO] Emit Weak ReadOnlyWithRel to ConstDataSection
Summary:
Darwin dynamic linker can handle weak symbols in ConstDataSection.
ReadonReadOnlyWithRel symbols should be emitted in ConstDataSection
instead of normal DataSection.

rdar://problem/39298457

Reviewers: dexonsmith, kledzik

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45472

llvm-svn: 329752
2018-04-10 20:16:35 +00:00
Amara Emerson e27d5016ef [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.
rdar://39175175

llvm-svn: 329743
2018-04-10 19:01:58 +00:00
Gabor Buella 213edc4a15 [X86] Split up -march=icelake to -client & -server
Reviewers: craig.topper, zvi, echristo

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45055

llvm-svn: 329742
2018-04-10 18:59:13 +00:00
Krzysztof Parzyszek 71a4c0ca07 [CodeGen] Fix printing bundles in MIR output
Delay printing the newline until after the opening bracket was
printed, e.g.
  BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 {
    renamable $r1 = S2_asr_i_r renamable $r1, 1
    renamable $r21 = A2_tfrsi 0
  }
instead of
  BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1
 {    renamable $r1 = S2_asr_i_r renamable $r1, 1
    renamable $r21 = A2_tfrsi 0
  }

llvm-svn: 329719
2018-04-10 16:46:13 +00:00
Peter Collingbourne a7d936f0c0 Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."
Caused a build failure in check-tsan.

llvm-svn: 329718
2018-04-10 16:19:30 +00:00
Francis Visoiu Mistrih f2c22050e8 [AArch64] Use FP to access the emergency spill slot
In the presence of variable-sized stack objects, we always picked the
base pointer when resolving frame indices if it was available.

This makes us hit an assert where we can't reach the emergency spill
slot if it's too far away from the base pointer. Since on AArch64 we
decide to place the emergency spill slot at the top of the frame, it
makes more sense to use FP to access it.

The changes here don't affect only emergency spill slots but all the
frame indices. The goal here is to try to choose between FP, BP and SP
so that we minimize the offset and avoid scavenging, or worse, asserting
when trying to access a slot allocated by the scavenger.

Previously discussed here: https://reviews.llvm.org/D40876.

Differential Revision: https://reviews.llvm.org/D45358

llvm-svn: 329691
2018-04-10 11:29:40 +00:00
Tim Renouf 7190a4692a [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading
scratch descriptor from GIT.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 329690
2018-04-10 11:25:15 +00:00
Chandler Carruth 0ca3bd0729 [x86] Model the direction flag (DF) separately from the rest of EFLAGS.
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.

In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.

I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.

Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.

Differential Revision: https://reviews.llvm.org/D45154

llvm-svn: 329673
2018-04-10 06:40:51 +00:00
Craig Topper 7e42af87a6 [X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits.
Prefer to use the 32-bit AND with immediate instead.

Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold.

Fixes PR37063.

llvm-svn: 329667
2018-04-10 03:44:15 +00:00
Chandler Carruth 19618fc639 [x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.

However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.

There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.

This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.

Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.

Differential Revision: https://reviews.llvm.org/D45146

llvm-svn: 329657
2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich 0cdc6ec535 ShadowCallStack/x86_64: Ignore pseudo-machine instructions
llvm-svn: 329656
2018-04-10 01:31:01 +00:00
Simon Pilgrim 3a8fc92865 [X86] Added missing AAD/AAM immediate schedule tests
Added some more TODOs for missing instructions

llvm-svn: 329626
2018-04-09 21:46:57 +00:00
Craig Topper 47b2f9d836 [X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW.
LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type.

Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round.

Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases.

This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that.

llvm-svn: 329616
2018-04-09 20:37:14 +00:00
Peter Collingbourne 5cff2409ae AArch64: Allow offsets to be folded into addresses with ELF.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. It reduces the size of Chromium for
Android's .text section by 46KB, or 56KB with ThinLTO (which exposes
more opportunities to use a direct access rather than a GOT access).

Because the addend range is limited in COFF and Mach-O, this is
enabled for ELF only.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329611
2018-04-09 19:59:57 +00:00
Alex Shlyapnikov 79f2c720b5 Revert "AMDGPU: enable 128-bit for local addr space under an option"
This reverts commit r329591.

It breaks various bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516
http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374
http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251
...

llvm-svn: 329610
2018-04-09 19:47:38 +00:00
Craig Topper 3a0cab73eb [X86] Remove GCCBuiltin name from pmuldq/pmuludq intrinsics so clang can custom lower to native IR. Update fast-isel intrinsic tests for clang's new codegen.
In somes cases fast-isel fails to remove the and/shifts and uses blends or conditional moves.

But once masking gets involved, fast-isel aborts on the mask portion and we DAG combine more thorougly.

llvm-svn: 329604
2018-04-09 19:17:38 +00:00
Craig Topper 0c2a12cb3e [X86] Revert the SLM part of r328914.
While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list.

llvm-svn: 329593
2018-04-09 17:07:40 +00:00
Marek Olsak 52b033b827 AMDGPU: enable 128-bit for local addr space under an option
Author: Samuel Pitoiset

ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.

Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329591
2018-04-09 16:56:32 +00:00
Simon Pilgrim 14566ea6ef [X86][SSE] Add floating point add/mul strict (ordered) vector.reduce tests (PR36732)
llvm-svn: 329587
2018-04-09 16:01:44 +00:00
Simon Pilgrim 23c2182c2b Support generic expansion of ordered vector reduction (PR36732)
Without the fast math flags, the llvm.experimental.vector.reduce.fadd/fmul intrinsic expansions must be expanded in order.

This patch scalarizes the reduction, applying the accumulator at the start of the sequence: ((((Acc + Scl[0]) + Scl[1]) + Scl[2]) + ) ... + Scl[NumElts-1]

Differential Revision: https://reviews.llvm.org/D45366

llvm-svn: 329585
2018-04-09 15:44:20 +00:00
Zaara Syeda 935474fef5 [MachineLICM] Re-enable hoisting of constant stores
This patch fixes an issue exposed on the SystemZ build bots when committing
https://reviews.llvm.org/rL327856. The hoisting was temporarily disabled with
an option. This patch now re-enables hoisting and checks that we only hoist a
store instruction when all its operands are either constant caller preserved
registers or immediates.

Differential Revision: https://reviews.llvm.org/D45286

llvm-svn: 329577
2018-04-09 14:50:02 +00:00
Simon Pilgrim e5ed5e2cba [X86][MMX] Fix missing itinerary for PALIGNR
llvm-svn: 329568
2018-04-09 13:52:33 +00:00
Simon Pilgrim 140fee078f [X86][MMX] Fix missing itinerary for MOVQ2DQ instruction format
llvm-svn: 329567
2018-04-09 13:42:14 +00:00
Simon Pilgrim abf3611332 [X86][MMX] Fix missing itinerary for CVTPI2PS
llvm-svn: 329565
2018-04-09 13:27:47 +00:00
Simon Pilgrim 0047efdd1e [X86][MMX] Fix flipped reg/mem typo in MMX_MISC_FUNC_ITINS
The RR/RM itineraries were the wrong way around

llvm-svn: 329561
2018-04-09 13:02:07 +00:00
Simon Pilgrim 6131286553 [X86][SSE] Fix f32 mul/div itinerary groups typo
The RM folded itineraries were incorrectly using the f64 version.

llvm-svn: 329556
2018-04-09 10:45:53 +00:00
Sam Parker 1f4f4d9a08 [DAGCombine] Improve ReduceLoad for SRL
Recommitting r329283, third time lucky...

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 329551
2018-04-09 08:16:11 +00:00
Michael Zolotukhin 8d052a0dd2 Remove MachineLoopInfo dependency from AsmPrinter.
Summary:
Currently MachineLoopInfo is used in only two places:
1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used.
2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true.
Despite that, we currently have a dependency on MachineLoopInfo, which makes
pass manager to compute it and MachineDominator Tree. This patch removes the
use (1) and makes the use (2) lazy, thus avoiding some redundant
recomputations.

Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi

Subscribers: rengolin, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D44812

llvm-svn: 329542
2018-04-09 00:54:47 +00:00
Craig Topper b7baa358f6 [X86] Add SchedWrites for CMOV and SETCC. Use them to remove InstRWs.
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.

This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.

The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.

Reviewers: RKSimon, andreadb, GGanesh

Reviewed By: RKSimon

Subscribers: GGanesh, llvm-commits

Differential Revision: https://reviews.llvm.org/D45380

llvm-svn: 329539
2018-04-08 17:53:18 +00:00
Craig Topper c362f42b6a [X86][Znver1] Remove InstRWs for BLENDVPS/PD
Summary:
This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data.

The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx"

Reviewers: RKSimon, GGanesh

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44841

llvm-svn: 329538
2018-04-08 17:53:15 +00:00
Simon Pilgrim bf2df1e26c [X86] Regenerate and + immediate mask tests
Added i686 checks

llvm-svn: 329529
2018-04-08 12:31:52 +00:00
Simon Pilgrim 44374cf7b0 [X86][PKU] Regenerate rdpkru/wrpkru intrinsic tests
Added i686 checks

llvm-svn: 329528
2018-04-08 12:30:30 +00:00
Simon Pilgrim 14df0ae8d2 [X86][SSE3] Regenerate mwait/monitor intrinsic tests
Added i686 checks

llvm-svn: 329527
2018-04-08 12:29:11 +00:00
Zvi Rackover 7a53f169f1 DAGCombiner: Combine SDIV with non-splat vector pow2 divisor
Summary:
Extend existing SDIV combine for pow2 constant divider to handle
non-splat vectors of pow2 constants.

Reviewers: RKSimon, craig.topper, spatel, hfinkel, efriedma

Reviewed By: RKSimon

Subscribers: magabari, llvm-commits

Differential Revision: https://reviews.llvm.org/D42479

llvm-svn: 329525
2018-04-08 11:35:20 +00:00
Simon Pilgrim 86588fc809 [X86][Btver2] Add vector extract costs
llvm-svn: 329524
2018-04-08 11:26:26 +00:00
Guozhi Wei 0eb86c8efc [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 329516
2018-04-07 23:36:10 +00:00
Simon Pilgrim d6981b1d37 [X86] Regenerate atom pshufb test
llvm-svn: 329511
2018-04-07 19:50:09 +00:00
Craig Topper ef37aebc96 [X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering.
Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget

llvm-svn: 329510
2018-04-07 19:09:52 +00:00
Craig Topper 5b95eae1c3 [DAGCombiner] Add a combine to turn a build vector of zero extends of extract vector elts into a vector zero extend and possibly an extract subvector.
llvm-svn: 329509
2018-04-07 19:09:50 +00:00
Tim Northover e25e458d52 Reapply ARM: Do not spill CSR to stack on entry to noreturn functions
Should fix UBSan bot by also checking there's no "uwtable" attribute
before skipping. Otherwise the unwind table will be useless since its
moves expect CSRs to actually be preserved.

A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch mostly by myeisha (pmb).

llvm-svn: 329494
2018-04-07 10:57:03 +00:00
Vitaly Buka de5f196530 Revert "ARM: Do not spill CSR to stack on entry to noreturn functions"
Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM

This reverts commit r329287

llvm-svn: 329486
2018-04-07 05:36:44 +00:00
Artem Belevich f256decdc4 [NVPTX] add support for initializing fp16 arrays.
Previously HalfTy was not handled which would either trigger an assertion,
or result in array initialized with garbage.

Differential Revision: https://reviews.llvm.org/D45391

llvm-svn: 329463
2018-04-06 22:25:08 +00:00
Artem Belevich a28e598ebb [NVPTX] Fixed vectorized LDG for f16.
v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16
and that was not previously handled correctly by tryLDGLDU()

Differential Revision: https://reviews.llvm.org/D45339

llvm-svn: 329456
2018-04-06 21:10:24 +00:00
Sameer AbuAsal c1b0e66b58 [RISCV] Tablegen-driven Instruction Compression.
Summary:

    This patch implements a tablegen-driven Instruction Compression
    mechanism for generating RISCV compressed instructions
    (C Extension) from the expanded instruction form.

    This tablegen backend processes CompressPat declarations in a
    td file and generates all the compile-time and runtime checks
    required to validate the declarations, validate the input
    operands and generate correct instructions.

    The checks include validating register operands, immediate
    operands, fixed register operands and fixed immediate operands.

    Example:
      class CompressPat<dag input, dag output> {
        dag Input  = input;
        dag Output    = output;
        list<Predicate> Predicates = [];
      }

      let Predicates = [HasStdExtC] in {
      def : CompressPat<(ADD GPRNoX0:$rs1, GPRNoX0:$rs1, GPRNoX0:$rs2),
                        (C_ADD GPRNoX0:$rs1, GPRNoX0:$rs2)>;
      }

    The result is an auto-generated header file
    'RISCVGenCompressEmitter.inc' which exports two functions for
    compressing/uncompressing MCInst instructions, plus
    some helper functions:

      bool compressInst(MCInst& OutInst, const MCInst &MI,
                        const MCSubtargetInfo &STI,
                        MCContext &Context);

      bool uncompressInst(MCInst& OutInst, const MCInst &MI,
                          const MCRegisterInfo &MRI,
                          const MCSubtargetInfo &STI);

    The clients that include this auto-generated header file and
    invoke these functions can compress an instruction before emitting
    it, in the target-specific ASM or ELF streamer, or can uncompress
    an instruction before printing it, when the expanded instruction
    format aliases is favored.

    The following clients were added to implement compression\uncompression
    for RISCV:

    1) RISCVAsmParser::MatchAndEmitInstruction:
       Inserted a call to compressInst() to compresses instructions
       parsed by llvm-mc coming from an ASM input.
    2) RISCVAsmPrinter::EmitInstruction:
       Inserted a call to compressInst() to compress instructions that
       were lowered from Machine Instructions (MachineInstr).
    3) RVInstPrinter::printInst:
       Inserted a call to uncompressInst() to print the expanded
       version of the instruction instead of the compressed one (e.g,
       add s0, s0, a5 instead of c.add s0, a5) when -riscv-no-aliases
       is not passed.

This patch squashes D45119, D42780 and D41932. It was reviewed in  smaller patches by
asb, efriedma, apazos and mgrang.

Reviewers: asb, efriedma, apazos, llvm-commits, sabuasal

Reviewed By: sabuasal

Subscribers: mgorny, eraman, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng

Differential Revision: https://reviews.llvm.org/D45385

llvm-svn: 329455
2018-04-06 21:07:05 +00:00
Matt Davis 13b8331054 [StackProtector] Ignore certain intrinsics when calculating sspstrong heuristic.
Summary:
The 'strong' StackProtector heuristic takes into consideration call instructions.
Certain intrinsics, such as lifetime.start, can cause the
StackProtector to protect functions that do not need to be protected.

Specifically, a volatile variable, (not optimized away), but belonging to a stack
allocation will encourage a llvm.lifetime.start to be inserted during
compilation. Because that intrinsic is a 'call' the strong StackProtector
will see that the alloca'd variable is being passed to a call instruction, and
insert a stack protector. In this case the intrinsic isn't really lowered to a
call. This can cause unnecessary stack checking, at the cost of additional
(wasted) CPU cycles.

In the future we should rely on TargetTransformInfo::isLoweredToCall, but as of
now that routine considers all intrinsics as not being lowerable. That needs
to be corrected, and such a change is on my list of things to get moving on.

As a side note, the updated stack-protector-dbginfo.ll test always seems to
pass.  I never see the dbg.declare/dbg.value reaching the
StackProtector::HasAddressTaken, but I don't see any code excluding dbg
intrinsic calls either, so I think it's the safest thing to do.

Reviewers: void, timshen

Reviewed By: timshen

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45331

llvm-svn: 329450
2018-04-06 20:14:13 +00:00
Krzysztof Parzyszek ed04f02432 [Hexagon] Handle subregisters when calculating iteration count in HW loops
llvm-svn: 329434
2018-04-06 17:51:57 +00:00
Simon Pilgrim 389dc7f0c8 Add additional tests from D45336
llvm-svn: 329427
2018-04-06 17:18:44 +00:00
Simon Pilgrim 1e6659c1f0 Add additional tests from D45366
llvm-svn: 329425
2018-04-06 17:15:56 +00:00
Craig Topper f0d042619b [X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.

Apparently we were inconsistent about whether the store has latency or not thus the test changes.

I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.

Reviewers: RKSimon, andreadb

Reviewed By: andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45351

llvm-svn: 329416
2018-04-06 16:16:48 +00:00
Simon Pilgrim 09eeb3a8b9 [X86][SandyBridge] Add (V)DPPS memory fold latencies
Noticed this during D44654

llvm-svn: 329389
2018-04-06 11:25:21 +00:00
Simon Pilgrim 8a83f16ccd [X86][SandyBridge] SBWriteResPair +5cy Memory Folds
As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions.

I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent.

As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests...

Differential Revision: https://reviews.llvm.org/D44654

llvm-svn: 329388
2018-04-06 11:00:51 +00:00
Francis Visoiu Mistrih 537d7eee90 [MIR] Add support for MachineFrameInfo::LocalFrameSize
MFI.LocalFrameSize was not serialized.

It is usually set from LocalStackSlotAllocation, so if that pass doesn't
run it is impossible do deduce it from the stack objects. Until now, this
information was lost.

llvm-svn: 329382
2018-04-06 08:56:25 +00:00
Hiroshi Inoue a2eefb6d9a [PowerPC] allow D-form VSX load/store when accessing FrameIndex without offset
VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this.
So far, the helper handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack.
For example, x-form store (stxvx) is used for int a[4] = {0}; instead of d-form store (stxv). For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces register pressure as well as instructions.

Differential Revision: https://reviews.llvm.org/D45079

llvm-svn: 329377
2018-04-06 05:41:16 +00:00
Zvi Rackover 78a065ff16 X86 Tests: Add a case for combining sdiv by a splatted pow2 negative. NFC.
Noticed test was missing while working on D42479.

llvm-svn: 329356
2018-04-05 21:57:20 +00:00
Craig Topper fbe3132f67 [X86] Separate CDQ and CDQE in the scheduler model.
According to Agner's data, CDQE is closer to CWDE.

llvm-svn: 329354
2018-04-05 21:56:19 +00:00
Craig Topper 4cc3827791 [X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler model
llvm-svn: 329351
2018-04-05 21:40:32 +00:00
Craig Topper 3b0b96c591 [X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge.
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.

The Sandy Bridge version was missing a load port use.

llvm-svn: 329347
2018-04-05 21:16:26 +00:00
Simon Pilgrim 9b41cac3e9 [X86][SSE] Add floating point add/mul fast-math vector.reduce tests
Strict versions aren't working at all (PR36732) and the accumulators aren't supported (PR36734)

llvm-svn: 329344
2018-04-05 21:01:21 +00:00
Simon Pilgrim 806252fab0 [X86][SSE] Add floating point min/max vector.reduce tests
llvm-svn: 329343
2018-04-05 20:54:55 +00:00
Konstantin Zhuravlyov c233ae8004 AMDGPU/Metadata: Always report a fixed number of hidden arguments
Currently it is 6. If the "feature" was not used, report dummy
hidden argument. Otherwise it does not match the kernarg size
reported in the kernel header.

Differential Revision: https://reviews.llvm.org/D45129

llvm-svn: 329341
2018-04-05 20:46:04 +00:00
Craig Topper c6bb36a3d0 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

llvm-svn: 329339
2018-04-05 20:04:06 +00:00
Craig Topper 9eec2025c5 [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
Mostly vector load, store, and move instructions.

llvm-svn: 329330
2018-04-05 18:38:45 +00:00
Simon Pilgrim 7f6f43fa3e [X86][SSE] Add integer add/mul vector.reduce tests
llvm-svn: 329321
2018-04-05 17:37:35 +00:00
Simon Pilgrim de5d0ffe47 [X86][SSE] Add integer and/or/xor vector.reduce tests
llvm-svn: 329320
2018-04-05 17:29:51 +00:00
Simon Pilgrim 57d324082c [X86][SSE] Add integer min/max vector.reduce tests
llvm-svn: 329319
2018-04-05 17:25:40 +00:00
Sam Clegg cfd44a2e69 [WebAssembly] Allow for the creation of user-defined custom sections
This patch adds a way for users to create their own custom sections to
be added to wasm files. At the LLVM IR layer, they are defined through
the "wasm.custom_sections" named metadata. The expected use case for
this is bindings generators such as wasm-bindgen.

Patch by Dan Gohman

Differential Revision: https://reviews.llvm.org/D45297

llvm-svn: 329315
2018-04-05 17:01:39 +00:00
Tim Northover b30388bf11 ARM: Do not spill CSR to stack on entry to noreturn functions
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch by myeisha (pmb).

llvm-svn: 329287
2018-04-05 14:26:06 +00:00
Sam Parker 0e7deb8104 [DAGCombine] Revert r329160
Again, broke the big endian stage 2 builders.

llvm-svn: 329283
2018-04-05 13:46:17 +00:00
Simon Dardis 47e66335ec [mips] Regenerate test before posting patch for constant multiplication (NFC)
llvm-svn: 329268
2018-04-05 10:30:17 +00:00
Craig Topper 15303dda0d [X86] Revert r329251-329254
It's failing on the bots and I'm not sure why.

This reverts:

[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC

llvm-svn: 329256
2018-04-05 05:19:36 +00:00
Craig Topper 25c7110a37 [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
Mostly vector load, store, and move instructions.

llvm-svn: 329254
2018-04-05 04:42:03 +00:00
Craig Topper 6c4e08c835 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

llvm-svn: 329252
2018-04-05 04:42:01 +00:00
Craig Topper 5c36557426 [X86] Auto-generate complete checks. NFC
llvm-svn: 329251
2018-04-05 04:41:59 +00:00
Puyan Lotfi d6f7313c8f [MIR-Canon] Improving performance by switching to named vregs.
No more skipping thounsands of vregs. Much faster running time.

llvm-svn: 329246
2018-04-05 00:27:15 +00:00
Puyan Lotfi 26c504fe1e [MIR-Canon] Adding support for multi-def -> user distance reduction.
llvm-svn: 329243
2018-04-05 00:08:15 +00:00
Peter Collingbourne f11eb3ebe7 AArch64: Implement support for the shadowcallstack attribute.
The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.

Differential Revision: https://reviews.llvm.org/D45239

llvm-svn: 329236
2018-04-04 21:55:44 +00:00
Craig Topper 498875fab0 [X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models.
The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r.

llvm-svn: 329211
2018-04-04 17:54:19 +00:00
Lei Huang 09fda63af0 [Power9]Legalize and emit code for quad-precision fma instructions
Legalize and emit code for the following quad-precision fma:

  * xsmaddqp
  * xsnmaddqp
  * xsmsubqp
  * xsnmsubqp

Differential Revision: https://reviews.llvm.org/D44843

llvm-svn: 329206
2018-04-04 16:43:50 +00:00
Nicolai Haehnle 2f5a73820c AMDGPU: Dimension-aware image intrinsics
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.

This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.

Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.

v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI

Change-Id: I099f309e0a394082a5901ea196c3967afb867f04

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44939

llvm-svn: 329166
2018-04-04 10:58:54 +00:00
Nicolai Haehnle 3ffd383a15 AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.

There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".

Fixes a bug encountered in Nier: Automata.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40547

llvm-svn: 329164
2018-04-04 10:57:58 +00:00
John Brawn 21d9b33d62 [AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y)
Differential Revision: https://reviews.llvm.org/D44573

llvm-svn: 329163
2018-04-04 10:12:53 +00:00
Sam Parker 7ec722d603 [DAGCombine] Improve ReduceLoadWidth for SRL
Recommitting rL321259. Previosuly this caused an issue with PPCBE but
I didn't receieve a reproducer and didn't have the time to follow up.
If the issue appears again, please provide a reproducer so I can fix
it.

Original commit message:

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350

llvm-svn: 329160
2018-04-04 09:26:56 +00:00
Vlad Tsyrklevich e3446017ed Add the ShadowCallStack pass
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.

Reviewers: pcc, vitalybuka

Reviewed By: pcc

Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D44802

llvm-svn: 329139
2018-04-04 01:21:16 +00:00
Jessica Paquette 5fa2a63785 [MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.

Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.

This also adds a new test for the new redzone behaviour.

llvm-svn: 329134
2018-04-03 23:32:41 +00:00
Farhana Aleen e80aeac0f2 [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D45219

llvm-svn: 329131
2018-04-03 23:00:30 +00:00
Jessica Paquette 642f6c61a3 [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfo
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.

This removes the requirement to pass -mno-red-zone when outlining for AArch64.

https://reviews.llvm.org/D45189

llvm-svn: 329120
2018-04-03 21:56:10 +00:00
Farhana Aleen 936947349a Revert "MSG"
This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de.

This was committed by mistake.

llvm-svn: 329119
2018-04-03 21:51:45 +00:00
Jessica Paquette d506bf8e3d [MachineOutliner][NFC] Make outlined functions have internal linkage
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.

This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
 

llvm-svn: 329116
2018-04-03 21:36:00 +00:00
Farhana Aleen 3ab409dc86 MSG
llvm-svn: 329114
2018-04-03 21:20:39 +00:00
Sanjay Patel 223ef402c9 [x86] add tests for convert-FP-to-integer with constants; NFC
We don't constant fold any of these, but we could...but if we
do, we must produce the right answer.

Unlike the IR fptosi instruction or its DAG node counterpart 
ISD::FP_TO_SINT, these are not undef for an out-of-range input.

llvm-svn: 329100
2018-04-03 18:34:56 +00:00
Krzysztof Parzyszek 45ac73f71a [Hexagon] Remove unneeded attributes from lit test
llvm-svn: 329078
2018-04-03 16:05:20 +00:00
Chandler Carruth ff2f4fcd51 [x86] Fix a pretty obvious think-o with my asm scrubbing. You have to in
fact use regular expression syntax to use regular expressions.

Should restore the bots. Sorry for the noise on this test.

Thanks to Philip for spotting the bug!

llvm-svn: 329057
2018-04-03 10:28:56 +00:00
Chandler Carruth 44a791a57a [x86] Clean up and enhance a test around eflags copying.
This adds the basic test cases from all the EFLAGS bugs in more direct
forms. It also switches to generated check lines, and includes both
32-bit and 64-bit variations.

No functionality changing here, just setting things up to have a nice
clean asm diff in my EFLAGS patch.

llvm-svn: 329056
2018-04-03 10:04:37 +00:00
Chandler Carruth 6646becd0c [x86] Extend my goofy SP offset scrubbing for llc test cases to actually
do explicit scrubbing of the offsets of stack spills and reloads.

You can always turn this off in order to test specific stack slot usage.
We were already hiding most of this, but the new logic hides it more
generically. Notably, we should effectively hide stack slot churn in
functions that have a frame pointer now, and should also hide it when
changing a function from stack pointer to frame pointer. That transition
already changes enough to be clearly noticed in the test case diff,
showing *every* spill and reload is really noisy without benefit. See
the test case I ran this on as a classic example.

llvm-svn: 329055
2018-04-03 09:57:05 +00:00
Yonghong Song d3b522f519 bpf: fix incorrect SELECT_CC lowering
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
  (1). In code, just swap is not enough, ConditionalCode CC
       should also be swapped, otherwise incorrect code will
       be generated.
  (2). The ConditionalCode swap should be subject to
       getHasJmpExt(). If getHasJmpExt() is False, certain
       conditional codes will not be supported and swap
       may generate incorrect code.

The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
2018-04-03 03:56:37 +00:00
Chandler Carruth 72eb30f7b3 [x86] Tidy up test case, generate check lines with script. NFC.
Just adds basic block labels and tidies up where comments go in the test
case and then generates fresh CHECK lines with the script. This way, the
check lines are much easier to maintain. They were already close to this
but not quite there.

llvm-svn: 329040
2018-04-03 02:19:05 +00:00
Rafael Espindola 8c58750cc4 Align stubs for external and common global variables to pointer size.
This patch fixes PR36885: clang++ generates unaligned stub symbol
holding a pointer.

Patch by Rahul Chaudhry!

llvm-svn: 329030
2018-04-02 23:20:30 +00:00
Lama Saba 927468309f [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Differential revision: https://reviews.llvm.org/D41330

Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
2018-04-02 13:48:28 +00:00
Craig Topper 96729cd64b [X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model.
Data taken from Table 16-17 in the Intel Optimization Manual.

llvm-svn: 328962
2018-04-02 06:34:16 +00:00
Craig Topper 6a814904da [X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide.
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/

llvm-svn: 328961
2018-04-02 05:54:34 +00:00
Craig Topper 8104f266a4 [X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.

llvm-svn: 328960
2018-04-02 05:33:28 +00:00
Craig Topper dc74094398 [X86] Fix the SchedRW for AVX512 shift instructions.
It was being inadvertently defaulted to an FADD scheduler class.

llvm-svn: 328959
2018-04-02 03:15:02 +00:00
Craig Topper caec723a1a [X86] Add an itinerary to BTR64rr.
llvm-svn: 328956
2018-04-02 01:12:34 +00:00
Petr Hosek 934e5d5436 [AArch64] Reserve x18 register on Fuchsia
This register is reserved as a platform register on Fuchsia.

Differential Revision: https://reviews.llvm.org/D45105

llvm-svn: 328950
2018-04-01 23:44:04 +00:00
Craig Topper db6caabccc [X86] Check if the load and store are to the same pointer before preventing i16 RMW shifts and subtracts from being promoted.
llvm-svn: 328930
2018-04-01 06:29:28 +00:00
Craig Topper 3998041e80 [X86] Add test case to show failure to promote i16 subtract when the LHS is a load and the result is stored to a different address.
We mistakenly believe we might be able to fold this as a RMW operation, but that doesn't end up happening.

llvm-svn: 328929
2018-04-01 06:29:27 +00:00
Craig Topper ae2de57db0 [X86] Allow i16 subtracts to be promoted if the load is on the LHS and its not being stored.
llvm-svn: 328928
2018-04-01 06:29:25 +00:00
Craig Topper 280f631350 [X86] Add test case to show failure to promote i16 subtract because we mistakenly believe the load can be folded. NFC
The left hand side of the subtract is a load, but we cna't fold those unless we also have a store.

llvm-svn: 328927
2018-04-01 06:29:23 +00:00
Sanjay Patel 6124cae8f7 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, 
so replace a pair of casts with the equivalent node. We don't have to account for 
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 328921
2018-03-31 17:55:44 +00:00
Simon Pilgrim 3b8ad346f9 [X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries
llvm-svn: 328918
2018-03-31 09:15:54 +00:00
Puyan Lotfi 57c4f38c35 [MIR-Canon] Adding support for local idempotent instruction hoisting.
llvm-svn: 328915
2018-03-31 05:48:51 +00:00
Craig Topper 13a0f83a05 [X86] Add SchedRW for PMULLD
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.

This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.

I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.

Reviewers: RKSimon, GGanesh, courbet

Reviewed By: RKSimon

Subscribers: gchatelet, gbedwell, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D44972

llvm-svn: 328914
2018-03-31 04:54:32 +00:00
Krzysztof Parzyszek 526fbf8e33 [Hexagon] Fix testcase
llvm-svn: 328899
2018-03-30 19:46:28 +00:00
Krzysztof Parzyszek 0f983d69a4 [Hexagon] Avoid creating invalid offsets in packetizer
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.

llvm-svn: 328897
2018-03-30 19:28:37 +00:00
Puyan Lotfi 399b46c98d [MIR] Adding support for Named Virtual Registers in MIR.
llvm-svn: 328887
2018-03-30 18:15:54 +00:00
Stanislav Mekhanoshin 74e2974ac6 [AMDGPU] Fixed some instructions latencies
Differential Revision: https://reviews.llvm.org/D45073

llvm-svn: 328874
2018-03-30 16:19:13 +00:00
Sanjay Patel e09b7dcf3d [SelectionDAG] Removing FABS folding from DAGCombiner
The code has bugs dealing with -0.0.

Since D44550 introduced FABS pattern folding in InstCombine, 
this patch removes the now-redundant code that causes 
https://bugs.llvm.org/show_bug.cgi?id=36600.

Patch by Mikhail Dvoretckii!

Differential Revision: https://reviews.llvm.org/D44683

llvm-svn: 328872
2018-03-30 15:42:52 +00:00
Krzysztof Parzyszek 46abcb236b [Hexagon] Fix printing :mem_noshuf on compiler-generated packets
llvm-svn: 328869
2018-03-30 15:09:05 +00:00
Michael Bedy 59e5ef793c [AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.

This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.

Reviewers: arsenm, #amdgpu

Reviewed By: arsenm, #amdgpu

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44364

llvm-svn: 328856
2018-03-30 05:03:36 +00:00
Eli Friedman 208fe67a78 [MachineCopyPropagation] Handle COPY with overlapping source/dest.
MachineCopyPropagation::CopyPropagateBlock has a bunch of special
handling for COPY instructions. This handling assumes that COPY
instructions do not modify the source of the copy; this is wrong if
the COPY destination overlaps the source.

To fix the bug, check explicitly for this situation, and fall back to
the generic instruction handling.

This bug can't happen for most register classes because they don't
have this sort of overlap, but there are a few register classes
where this is possible. The testcase uses the AArch64 QQQQ register
class.

Differential Revision: https://reviews.llvm.org/D44911

llvm-svn: 328851
2018-03-30 00:56:03 +00:00
Matt Arsenault 03ae399d50 AMDGPU: Support realigning stack
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.

I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.

llvm-svn: 328831
2018-03-29 21:30:06 +00:00
Craig Topper 89310f56c8 [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Differential Revision: https://reviews.llvm.org/D44838

llvm-svn: 328823
2018-03-29 20:41:39 +00:00
Matt Arsenault ffb132e74b AMDGPU: Increase default stack alignment
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.

llvm-svn: 328821
2018-03-29 20:22:04 +00:00
Matt Arsenault 6c041a3cab AMDGPU: Fix selection error on constant loads with < 4 byte alignment
llvm-svn: 328818
2018-03-29 19:59:28 +00:00
Paul Robinson 407ff1b1cd Try to fix a couple tests for Windows.
llvm-svn: 328814
2018-03-29 18:59:33 +00:00
Paul Robinson b271f31d8d Reapply "[DWARFv5] Emit file 0 to the line table."
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files.  This makes the line table more
independent of the .debug_info section.
We emit the new syntax only for DWARF v5 and later.

Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows.  Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.

Differential Revision: https://reviews.llvm.org/D44054

llvm-svn: 328805
2018-03-29 17:16:41 +00:00
Krzysztof Parzyszek dc7a557e6a [Hexagon] Add support to handle bit-reverse load intrinsics
Patch by Sumanth Gundapaneni.

llvm-svn: 328774
2018-03-29 13:52:46 +00:00
Jun Bum Lim f90fe701ef [PostRAMachineSink] preserve CFG
Summary: Mark CFG is preserved  since this pass do not make any change in CFG.

Reviewers: sebpop, mzolotukhin, mcrosier

Reviewed By: mzolotukhin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44845

llvm-svn: 328727
2018-03-28 19:56:26 +00:00
Krzysztof Parzyszek 440ba3ae5c [Hexagon] Add support for "new" circular buffer intrinsics
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" versions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.

We need to use pseudo instructions for these instructions until
after register allocation. The problem is that these instructions
allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for
the CSx set-up until after register allocation when the Mx
register has been fixed for the instruction.

There is a related clang patch.

Patch by Brendon Cahoon.

llvm-svn: 328724
2018-03-28 19:38:29 +00:00
Jessica Paquette 4aa14dbcc2 [MachineOutliner] Simplify call outlining + require valid callee save info for call outlining
This commit simplifies the call outlining logic by removing references to the
Function associated with the callee. To do this, it requires that valid
callee save info is available to the outliner.

llvm-svn: 328719
2018-03-28 17:52:31 +00:00
Simon Pilgrim 7237e0cf39 [X86][AVX2] Add shuffle test case from PR36933
llvm-svn: 328714
2018-03-28 16:48:48 +00:00
Alexander Potapenko 202f809437 Revert "Reapply "[DWARFv5] Emit file 0 to the line table.""
This reverts commit r328676.

Commit r328676 broke the -no-integrated-as flag necessary to build Linux kernel with Clang:

$ cat t.c
void foo() {}
$ clang -no-integrated-as   -c  t.c -g
/tmp/t-dcdec5.s: Assembler messages:
/tmp/t-dcdec5.s:8: Error: file number less than one
clang-7.0: error: assembler command failed with exit code 1 (use -v to see invocation)

llvm-svn: 328699
2018-03-28 12:36:46 +00:00
Tim Renouf cdac172e2a Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"
This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981.

It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.

Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a
llvm-svn: 328695
2018-03-28 11:21:07 +00:00
Christof Douma a1e77c0e02 [ARM] Support float literals under XO
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.

Differential Revision: https://reviews.llvm.org/D44941

llvm-svn: 328691
2018-03-28 10:02:26 +00:00
Mikael Holmen 6c062b7641 [RegisterCoalescing] Don't move COPY if it would interfere with another value
Summary:
RegisterCoalescer::removePartialRedundancy tries to hoist B = A from
BB0/BB2 to BB1:

  BB1:
       ...
  BB0/BB2:  ----
       B = A;   |
       ...      |
       A = B;   |
         |-------
         |

It does so if a number of conditions are fulfilled. However, it failed
to check if B was used by any of the terminators in BB1. Since we must
insert B = A before the terminators (since it's not a terminator itself),
this means that we could erroneously insert a new definition of B before a
use of it.

Reviewers: wmi, qcolombet

Reviewed By: wmi

Subscribers: MatzeB, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D44918

llvm-svn: 328689
2018-03-28 06:01:30 +00:00
Sanjay Patel bb33007b25 [AArch64] add ftrunc tests; NFC
As suggested in D44909.

llvm-svn: 328683
2018-03-28 00:56:00 +00:00
Sanjay Patel 594c1546f1 [PowerPC] add ftrunc vector tests; NFC
Baseline tests for vectors as suggested in D44909.

llvm-svn: 328682
2018-03-28 00:49:12 +00:00
Paul Robinson 07480bd177 Reapply "[DWARFv5] Emit file 0 to the line table."
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files.  This makes the line table more
independent of the .debug_info section.

Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows.  Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.

Differential Revision: https://reviews.llvm.org/D44054

llvm-svn: 328676
2018-03-27 22:40:34 +00:00
Jessica Paquette 2519ee7081 [MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operands
If an ADRP appears with, say, a CPI operand, we shouldn't outline it.

This moves the check for unsafe operands so that it occurs before the special-case
for ADRPs. Also add a test for outlining ADRPs.

llvm-svn: 328674
2018-03-27 22:23:48 +00:00
Tim Renouf e4208bfa5b [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 328673
2018-03-27 21:35:00 +00:00
Paul Robinson 7cb26ad2ef [DWARF] Suppress split line tables more carefully.
If a given split type unit does not have source locations, don't have
it refer to the split line table.
If no split type unit refers to the split line table, don't emit the
line table at all.

This will save a little space on rare occasions, but also refactors
things a bit to improve which class is responsible for what.

Responding to review comments on r326395.

Differential Revision: https://reviews.llvm.org/D44220

llvm-svn: 328670
2018-03-27 21:28:59 +00:00
Tim Renouf 4db0960420 [CodeGen] Fixed unreachable with -print-machineinstrs and custom pseudo source value
Summary:
Rev 327580 "[CodeGen] Use MIR syntax for MachineMemOperand printing"
broke -print-machineinstrs for us on AMDGPU, because we have custom
pseudo source values, and MIR serialization does not implement that.

This commit at least restores the functionality of -print-machineinstrs,
even if it does not properly implement the missing MIR serialization
functionality.

Differential Revision: https://reviews.llvm.org/D44871

Change-Id: I44961c0b90bf6d48c01484ed7a4e466fd300db66
llvm-svn: 328668
2018-03-27 21:14:04 +00:00
Simon Pilgrim a2f26788a3 [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.

Differential Revision: https://reviews.llvm.org/D44924

llvm-svn: 328664
2018-03-27 20:38:54 +00:00
Matt Arsenault 17f3338015 AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.

llvm-svn: 328659
2018-03-27 19:42:55 +00:00
Matt Arsenault 0a0c871f60 AMDGPU: Fix crash when MachinePointerInfo invalid
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.

llvm-svn: 328652
2018-03-27 18:39:45 +00:00
Matt Arsenault 126a874952 AMDGPU: Fix register name format in tests
These were changed to match the asm output name a long time ago,
although I think the old tablegenerated names still work.

llvm-svn: 328651
2018-03-27 18:39:42 +00:00
Matt Arsenault e9f3679031 AMDGPU: Fix FP restore from being reordered with stack ops
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.

Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.

Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.

llvm-svn: 328650
2018-03-27 18:38:51 +00:00
Krzysztof Parzyszek 0375cd46ef [Hexagon] Implement TTI::shouldMaximizeVectorBandwidth
llvm-svn: 328648
2018-03-27 18:10:47 +00:00
Stefan Pintilie 659f040351 [Power9] Fix the resource list for the COPY instruction.
The COPY instruction was listed as a 4 cycle instruction.
It is now listed correctly as a 2 cycle ALU instruction.

llvm-svn: 328647
2018-03-27 17:51:53 +00:00
Artur Pilipenko ca1d849cd6 Fix a reoccuring typo in load-combine tests
%tmp = bitcast i32* %arg to i8*
   %tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
-  %tmp2 = load i8, i8* %tmp, align 1
+  %tmp2 = load i8, i8* %tmp1, align 1

This doesn't change the semantics of the tests but makes use of %tmp1 which was originally intended.

llvm-svn: 328642
2018-03-27 17:33:50 +00:00
Krzysztof Parzyszek 52396bb9c5 Use .set instead of = when printing assignment in assembly output
On Hexagon "x = y" is a syntax used in most instructions, and is not
treated as a directive.

Differential Revision: https://reviews.llvm.org/D44256

llvm-svn: 328635
2018-03-27 16:44:41 +00:00
Simon Pilgrim 5f7ab4fedf [X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler class
llvm-svn: 328620
2018-03-27 12:26:12 +00:00
Strahinja Petrovic 06cf6a6490 [PowerPC] Secure PLT support
This patch supports secure PLT mode for PowerPC 32 architecture.

Differential Revision: https://reviews.llvm.org/D42112

llvm-svn: 328617
2018-03-27 11:23:53 +00:00
Sanjay Patel 15f7df9f44 [x86] add RUN for target before roundss; NFC
llvm-svn: 328601
2018-03-27 00:32:19 +00:00
Sanjay Patel 8653776367 [x86] add tests for ftrunc; NFC
llvm-svn: 328592
2018-03-26 23:18:32 +00:00
Simon Pilgrim f6440b6fb1 Fix newlines. NFCI.
llvm-svn: 328583
2018-03-26 21:07:59 +00:00
Simon Pilgrim 28e7bcbba6 [X86] Add WriteCRC32 scheduler class
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.

Differential Revision: https://reviews.llvm.org/D44647

llvm-svn: 328582
2018-03-26 21:06:14 +00:00
Rafael Espindola 78fdca3cd5 Use local symbols for creating .stack-size.
llvm-svn: 328581
2018-03-26 20:40:22 +00:00
Reid Kleckner 41fb2dba9c [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32
Summary:
Re-lands r328386 and r328443, reverting r328482.

Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small
parameters in i8 and i16 do not end up in the SysV register parameters
(EDI, ESI, etc).

I added tests for how we receive small parameters, since that is the
important part. It's always safe to store more bytes than will be read,
but the assumptions you make when loading them are what really matter.

I also tested this by self-hosting clang and it passed tests on win64.

Reviewers: mstorsjo, hans

Subscribers: hiraditya, mstorsjo, llvm-commits

Differential Revision: https://reviews.llvm.org/D44900

llvm-svn: 328570
2018-03-26 18:49:48 +00:00
Simon Pilgrim f33d905293 [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881)
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.

These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).

Differential Revision: https://reviews.llvm.org/D44879

llvm-svn: 328566
2018-03-26 18:19:28 +00:00
Krzysztof Parzyszek 5488deb1ab [Hexagon] Add more lit tests
llvm-svn: 328561
2018-03-26 17:53:48 +00:00
Lei Huang be0afb0870 [Power9]Legalize and emit code for quad-precision convert from double-precision
Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.

Differential Revision: https://reviews.llvm.org/D44746

llvm-svn: 328558
2018-03-26 17:46:25 +00:00
Stefan Pintilie 26d4f923c4 [PowerPC] Infrastructure work. Implement getting the opcode for a spill in one place.
A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.

Differential Revision: https://reviews.llvm.org/D43086

llvm-svn: 328556
2018-03-26 17:39:18 +00:00
Simon Pilgrim 86ea53123d [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs
We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR.....

llvm-svn: 328551
2018-03-26 17:02:02 +00:00
Krzysztof Parzyszek 9f041b1830 [Pipeliner] Add missing loop carried dependences
The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.

The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.

The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.

Patch by Brendon Cahoon.

llvm-svn: 328547
2018-03-26 16:50:11 +00:00
Krzysztof Parzyszek a212204453 [Pipeliner] Use latency to compute RecMII
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.

The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a 
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.

The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).

The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.

The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound. 

Patch by Brendon Cahoon.

llvm-svn: 328542
2018-03-26 16:33:16 +00:00
Simon Pilgrim 8815105cd5 [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs
llvm-svn: 328541
2018-03-26 16:24:13 +00:00
Krzysztof Parzyszek f13bbf1d58 [Pipeliner] Fix assert caused by pipeliner serialization
The pipeliner is asserting because the serialization step that 
occurs at the end is deleting an instruction.  The assert
occurs later on because there is a use without a definition.  

The problem occurs when an instruction defines a value used 
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet.  The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.

There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.

This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.

Patch by Brendon Cahoon.

llvm-svn: 328540
2018-03-26 16:23:29 +00:00
Krzysztof Parzyszek 8e1363df4e [Pipeliner] Fix check for order dependences when finalizing instructions
The code in orderDepdences that looks at the order dependences between
instructions was processing all the successor and predecessor order
dependences. However, we really only want to check for an order dependence
for instructions scheduled in the same cycle.

Also, fixed how the pipeliner handles output dependences. An output
dependence is also a potential loop carried dependence. The pipeliner
didn't handle this case properly so an invalid schedule could be created
that allowed an output dependence to be scheduled in the next iteration
at the same cycle.

Patch by Brendon Cahoon.

llvm-svn: 328516
2018-03-26 16:05:55 +00:00
Krzysztof Parzyszek 3a0a15afe7 [Pipeliner] Fix in the pipeliner phi reuse code
When the definition of a phi is used by a phi in the next iteration,
the pipeliner was assuming that the definition is processed first.
Because of the assumption, an incorrect phi name was used. This patch
has a check to see if the phi definition has been processed already.

Patch by Brendon Cahoon.

llvm-svn: 328510
2018-03-26 15:58:16 +00:00
Krzysztof Parzyszek 785b6cec11 [Pipeliner] Correctly update memoperands in the epilog
The pipeliner needs to be conservative when updating the memoperands
of instructions in the epilog. Previously, the pipeliner was changing
the offset of the memoperand based upon the scheduling stage. However,
that is incorrect when control flow branches around the kernel code.
The bug enabled a load and store to the same stack offset to be swapped.

This patch fixes the bug by updating the size of the memoperands to be
UINT_MAX. This conservative value means that dependences will be created
between other loads and stores.

Patch by Brendon Cahoon.

llvm-svn: 328508
2018-03-26 15:45:55 +00:00
Krzysztof Parzyszek 56f0fc4716 [Hexagon] Give priority to post-incremementing memory accesses in LSR
llvm-svn: 328506
2018-03-26 15:32:03 +00:00
Simon Pilgrim 0b73b29388 [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs
Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write)

This also adds missing vcvttss2si tests 

llvm-svn: 328505
2018-03-26 15:30:47 +00:00
Simon Pilgrim 3aa9344605 [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs
These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults.

llvm-svn: 328501
2018-03-26 14:44:24 +00:00
Simon Pilgrim 67df1cf597 [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs
The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps

llvm-svn: 328497
2018-03-26 14:03:40 +00:00
Simon Pilgrim caa203aed5 [X86][Btver2] Double the AGU and schedule pipe resources for YMM
Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model.

llvm-svn: 328491
2018-03-26 13:15:20 +00:00
Hans Wennborg 311b63f13b Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32"
This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up
patch at D44876 fixes this, but let's revert back to green for now until that's
ready to land.

(Also reverts r328443.)

> Both GCC and MSVC only look at the low byte of a boolean when it is
> passed.

llvm-svn: 328482
2018-03-26 10:07:51 +00:00
Craig Topper 6f28d3c954 [X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT.
llvm-svn: 328474
2018-03-26 05:05:12 +00:00
Craig Topper cdfcf8ecda [X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models.
I've used Agner's data as best I could to get the values to converge on.

llvm-svn: 328473
2018-03-26 05:05:10 +00:00
Craig Topper fbf2d850e3 [X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions.
llvm-svn: 328472
2018-03-26 04:20:36 +00:00
Craig Topper 659f85af14 [X86] Swap the itineraries on the memory and register forms of CVTDQ2PD.
They were backwards.

llvm-svn: 328469
2018-03-26 02:17:13 +00:00
Craig Topper 15fef89ad9 [X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 for Skylake.
This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX.

llvm-svn: 328465
2018-03-25 23:40:56 +00:00
Mandeep Singh Grang 98bc25a0f2 [RISCV] Use init_array instead of ctors for RISCV target, by default
Summary:
LLVM defaults to the newer .init_array/.fini_array scheme for static
constructors rather than the less desirable .ctors/.dtors (the UseCtors
flag defaults to false). This wasn't being respected in the RISC-V
backend because it fails to call TargetLoweringObjectFileELF::InitializeELF with the the appropriate
flag for UseInitArray.
This patch fixes this by implementing RISCVELFTargetObjectFile and overriding its Initialize method to call
InitializeELF(TM.Options.UseInitArray).

Reviewers: asb, apazos

Reviewed By: asb

Subscribers: mgorny, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, llvm-commits

Differential Revision: https://reviews.llvm.org/D44750

llvm-svn: 328433
2018-03-24 18:37:19 +00:00
Simon Pilgrim 913345f8f5 [X86][AES] Ensure we're testing both non-VEX/VEX variants of AES instructions on AVX targets
Add skylake server tests as well

llvm-svn: 328424
2018-03-24 15:05:12 +00:00
Simon Pilgrim 91fe24b8cf [X86][SSE] Ensure we're testing both non-VEX/VEX variants of SSE instructions on AVX targets
And ensure we don't use later instruction sets in SSE schedule tests

llvm-svn: 328423
2018-03-24 14:51:52 +00:00
Simon Pilgrim f7d0f7e6db [X86][AVX1] Ensure we don't use later instruction sets in AVX1 schedule tests
llvm-svn: 328421
2018-03-24 13:47:48 +00:00
Simon Pilgrim d2016f95fb [X86][AVX2] Ensure we don't use later instruction sets in AVX2 schedule tests
llvm-svn: 328420
2018-03-24 13:47:01 +00:00
Craig Topper 2c0a62ab9a [X86] Add a DAG combine to simplify PMULDQ/PMULUDQ nodes
These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them.

Differential Revision: https://reviews.llvm.org/D44375

llvm-svn: 328405
2018-03-24 01:52:01 +00:00
Reid Kleckner e27b410661 [X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32
Both GCC and MSVC only look at the low byte of a boolean when it is
passed.

llvm-svn: 328386
2018-03-23 23:38:53 +00:00
Krzysztof Parzyszek bcf0a96f9e [Hexagon] Boost profit for word-mask immediates, reduce for others
This avoids unnecessary splitting due to uninteresting immediates.

llvm-svn: 328364
2018-03-23 20:11:00 +00:00
Krzysztof Parzyszek e247526cc9 [Hexagon] Fold offset in base+immediate loads/stores
Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz  =>  memw(Rx,#n) = Rz.

Patch by Jyotsna Verma.

llvm-svn: 328355
2018-03-23 19:30:34 +00:00
Tony Tye 88441a3d1e [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU
Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue.

Differential Revision: https://reviews.llvm.org/D44697

llvm-svn: 328351
2018-03-23 18:58:47 +00:00
Tony Tye 7a893d4e34 [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.

Differential Revision: https://reviews.llvm.org/D43736

llvm-svn: 328349
2018-03-23 18:45:18 +00:00
Krzysztof Parzyszek 5f7ba9a74c [Hexagon] Always generate mux out of predicated transfers if possible
HexagonGenMux would collapse pairs of predicated transfers if it assumed
that the predicated .new forms cannot be created. Turns out that generating
mux is preferable in almost all cases.
Introduce an option -hexagon-gen-mux-threshold that controls the minimum
distance between the instruction defining the predicate and the later of
the two transfers. If the distance is closer than the threshold, mux will
not be generated. Set the threshold to 0 by default.

llvm-svn: 328346
2018-03-23 18:43:09 +00:00
Krzysztof Parzyszek 80f10e4fe5 [Hexagon] Avoid early if-conversion for one sided branches
Patch by Anand Kodnani.

llvm-svn: 328344
2018-03-23 18:00:18 +00:00
Ana Pazos 41573804f2 [ARM] Fix "Constant pool entry out of range!" in Thumb1 mode
This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode.

In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode,
adjustBBOffsetsAfter() is not calculating postOffset correctly by
properly accounting for the padding that is required for the constant pool
that immediately follows the jump table branch  instruction.

Reviewers: t.p.northover, eli.friedman

Reviewed By: t.p.northover

Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D44709

llvm-svn: 328341
2018-03-23 17:53:27 +00:00
Krzysztof Parzyszek 570c6440cd [Hexagon] Two fixes in early if-conversion
- Fix checking for vector predicate registers.
- Avoid speculating llvm.lifetime.end intrinsic.

Patch by Harsha Jagasia and Brendon Cahoon.

llvm-svn: 328339
2018-03-23 17:46:09 +00:00
Simon Pilgrim e5c0a041ff [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit
Add missing non-VEX and (V)PMOVMSKB instructions to the pattern

llvm-svn: 328338
2018-03-23 17:38:59 +00:00
Zaara Syeda 6535993625 Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant stores
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.

Differential Revision: https://reviews.llvm.org/D40196

llvm-svn: 328326
2018-03-23 15:28:15 +00:00
John Brawn e3b44f9de6 [AArch64] Don't reduce the width of loads if it prevents combining a shift
Loads and stores can only shift the offset register by the size of the value
being loaded, but currently the DAGCombiner will reduce the width of the load
if it's followed by a trunc making it impossible to later combine the shift.

Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and
make it prevent the width reduction if this is what would happen, though do
allow it if reducing the load width will let us eliminate a later sign or zero
extend.

Differential Revision: https://reviews.llvm.org/D44794

llvm-svn: 328321
2018-03-23 14:47:07 +00:00