Commit Graph

49110 Commits

Author SHA1 Message Date
Sameer Sahasrabuddhe 2de7653fd5 [AMDGPU] lower-switch in preISel as a workaround for legacy DA
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll

Differential Revision: https://reviews.llvm.org/D52221

llvm-svn: 342722
2018-09-21 11:26:55 +00:00
Alexander Timofeev 36617f0160 [AMDGPU] Divergence driven instruction selection. Part 1.
Summary: This change is the first part of the AMDGPU target description
    change. The aim of it is the effective splitting the vector and scalar
    flows at the selection stage. Selection uses predicate functions based
    on the framework implemented earlier - https://reviews.llvm.org/D35267

    Differential revision: https://reviews.llvm.org/D52019

    Reviewers: rampitec

llvm-svn: 342719
2018-09-21 10:31:22 +00:00
Yonghong Song 150ca5143b bpf: check illegal usage of XADD insn return value
Currently, BPF has XADD (locked add) insn support and the
asm looks like:
  lock *(u32 *)(r1 + 0) += r2
  lock *(u64 *)(r1 + 0) += r2
The instruction itself does not have a return value.

At the source code level, users often use
  __sync_fetch_and_add()
which eventually translates to XADD. The return value of
__sync_fetch_and_add() is supposed to be the old value
in the xadd memory location. Since BPF::XADD insn does not
support such a return value, this patch added a PreEmit
phase to check such a usage. If such an illegal usage
pattern is detected, a fatal error will be reported like
  line 4: Invalid usage of the XADD return value
if compiled with -g, or
  Invalid usage of the XADD return value
if compiled without -g.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 342692
2018-09-20 22:24:27 +00:00
Thomas Lively 6f21a13675 [WebAssembly] Add V128 value type to binary format
Summary: Adds the necessary support to lib/ObjectYAML and fixes SIMD
calls to allow the tests to work. Also removes some dead code that
would otherwise have to have been updated.

Reviewers: aheejin, dschuff, sbc100

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52105

llvm-svn: 342689
2018-09-20 22:04:44 +00:00
Sanjay Patel 8a1227ccc8 [SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. 
No functional change intended.

I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be 
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.

Differential Revision: https://reviews.llvm.org/D52285

llvm-svn: 342669
2018-09-20 17:34:08 +00:00
Simon Pilgrim 3e2de767f6 [X86][SSE] Remove UNPCKL(SHUFFLE)->UNPCKH custom combine
This can be achieved more generally by combineX86ShufflesRecursively.

llvm-svn: 342645
2018-09-20 13:10:22 +00:00
Simon Pilgrim 46c1dcb1af [X86][SSE] Remove PSHUFLW/PSHUFHW combineRedundantHalfShuffle combine
This can be achieved more generally by combineX86ShufflesRecursively and was causing a fuzz test failure found by Mikael Holmén.

llvm-svn: 342642
2018-09-20 12:11:38 +00:00
Alex Bradbury 96ed75d066 [RISCV][MC] Modify evaluateConstantImm interface to allow reuse from addExpr
This is a trivial refactoring that I'm committing now as it makes a patch I'm 
about to post for review easier to follow. There is some overlap between 
evaluateConstantImm and addExpr in RISCVAsmParser. This patch allows 
evaluateConstantImm to be reused from addExpr to remove this overlap. The 
benefit will be greater when a future patch adds extra code to allows 
immediates to be evaluated from constant symbols (e.g. `.equ CONST, 0x1234`).

No functional change intended.

llvm-svn: 342641
2018-09-20 11:40:43 +00:00
Alex Bradbury 226f3ef5a5 [RISCV][MC] Improve parsing of jal/j operands
Examples such as `jal a3`, `j a3` and `jal a3, a3` are accepted by gas 
but rejected by LLVM MC. This patch rectifies this. I introduce 
RISCVAsmParser::parseJALOffset to ensure that symbol names that coincide with 
register names can safely be parsed. This is made a somewhat fiddly due to the 
single-operand alias form (see the comment in parseJALOffset for more info).

Differential Revision: https://reviews.llvm.org/D52029

llvm-svn: 342629
2018-09-20 08:10:35 +00:00
Maya Madhavan ec1efe4ee3 Fix for bug 34002 - label generated before it block is finalized. Differential Revision: https://reviews.llvm.org/D52258
llvm-svn: 342615
2018-09-20 05:11:42 +00:00
QingShan Zhang cae9425a3c [PowerPC] Fix the assert of combineBVOfConsecutiveLoads when element num is 1
Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive.
But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52072

llvm-svn: 342611
2018-09-20 03:09:15 +00:00
Thomas Lively f45de47c59 [WebAssembly] Renumber SIMD ops
Summary:
This change leaves holes in the opcode space where missing
instructions could logically be added later if they were found to be
useful.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52282

llvm-svn: 342610
2018-09-20 02:55:28 +00:00
Matthias Braun 28d6a4ac9a AArch64: Add FuseCryptoEOR fusion rules
There's some additional rules available on newer apple CPUs.

rdar://41235346

llvm-svn: 342590
2018-09-19 20:50:51 +00:00
Evandro Menezes 8a6973d6ff [ARM] Adjust the feature set for Exynos
Fine tune the cost model for all Exynos processors.

llvm-svn: 342585
2018-09-19 19:51:29 +00:00
Evandro Menezes c62ab61173 [ARM] Refactor Exynos feature set (NFC)
Since all Exynos processors share the same feature set, fold them in the
implied fatures list for the subtarget.

llvm-svn: 342583
2018-09-19 19:43:23 +00:00
Simon Pilgrim 2d0f20cc04 [X86] Handle COPYs of physregs better (regalloc hints)
Enable enableMultipleCopyHints() on X86.

Original Patch by @jonpa:

While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling.

Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates.

Differential Revision: https://reviews.llvm.org/D38128

llvm-svn: 342578
2018-09-19 18:59:08 +00:00
Sanjay Patel 1a1c0ee599 [x86] change names of vector splitting helper functions; NFC
As the code comments suggest, these are about splitting, and they
are not necessarily limited to lowering, so that misled me.

There's nothing that's actually x86-specific in these either, so 
they might be better placed in a common header so any target can 
use them.

llvm-svn: 342575
2018-09-19 18:52:00 +00:00
Simon Atanasyan a9e8765e3e [mips][microMIPS] Extending size reduction pass with MOVEP
The patch extends size reduction pass for MicroMIPS. Two MOVE
instructions are transformed into one MOVEP instrucition.

Patch by Milena Vujosevic Janicic.

Differential revision: https://reviews.llvm.org/D52037

llvm-svn: 342572
2018-09-19 18:46:29 +00:00
Simon Atanasyan 852dd83be8 [mips][microMIPS] Fix the definition of MOVEP instruction
The patch fixes definition of MOVEP instruction. Two registers are used
instead of register pairs. This is necessary as machine verifier cannot
handle register pairs.

Patch by Milena Vujosevic Janicic.

Differential revision: https://reviews.llvm.org/D52035

llvm-svn: 342571
2018-09-19 18:46:21 +00:00
Simon Pilgrim 8191d63c3b [X86] Add initial SimplifyDemandedVectorEltsForTargetNode support
This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles.

Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity.

Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc.

NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification.

Differential Revision: https://reviews.llvm.org/D52140

llvm-svn: 342564
2018-09-19 18:11:34 +00:00
Carl Ritson 6b8d75425e [AMDGPU] Add instruction selection for i1 to f16 conversion
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52018

Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851
llvm-svn: 342558
2018-09-19 16:32:12 +00:00
Yonghong Song 5b476c5a9f [bpf] Symbol sizes and types in object file
Clang-compiled object files currently don't include the symbol sizes and
types.  Some tools however need that information.  For example, ctfconvert
uses that information to generate FreeBSD's CTF representation from ELF
files.
With this patch, symbol sizes and types are included in object files.

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>
llvm-svn: 342556
2018-09-19 16:04:13 +00:00
Andrea Di Biagio 8b6c314be1 [TableGen][SubtargetEmitter] Add the ability for processor models to describe dependency breaking instructions.
This patch adds the ability for processor models to describe dependency breaking
instructions.

Different processors may specify a different set of dependency-breaking
instructions.
That means, we cannot assume that all processors of the same target would use
the same rules to classify dependency breaking instructions.

The main goal of this patch is to provide the means to describe dependency
breaking instructions directly via tablegen, and have the following
TargetSubtargetInfo hooks redefined in overrides by tabegen'd
XXXGenSubtargetInfo classes (here, XXX is a Target name).

```
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
  return false;
}

virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
  return isZeroIdiom(MI);
}
```

An instruction MI is a dependency-breaking instruction if a call to method
isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to
true. Similarly, an instruction MI is a special case of zero-idiom dependency
breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine
operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of
dependency-breaking. In the absence of external information, those method calls
would always return false.

A new tablegen class named STIPredicate has been added by this patch to let
processor models classify instructions that have properties in common. The idea
is that, a MCInstrPredicate definition can be used to "generate" an instruction
equivalence class, with the idea that instructions of a same class all have a
property in common.

STIPredicate definitions are essentially a collection of instruction equivalence
classes.
Also, different processor models can specify a different variant of the same
STIPredicate with different rules (i.e. predicates) to classify instructions.
Tablegen backends (in this particular case, the SubtargetEmitter) will be able
to process STIPredicate definitions, and automatically generate functions in
XXXGenSubtargetInfo.

This patch introduces two special kind of STIPredicate classes named
IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a
definition for those in the BtVer2 scheduling model only.

This patch supersedes the one committed at r338372 (phabricator review: D49310).

The main advantages are:
 - We can describe subtarget predicates via tablegen using STIPredicates.
 - We can describe zero-idioms / dep-breaking instructions directly via
   tablegen in the scheduling models.

In future, the STIPredicates framework can be used for solving other problems.
Examples of future developments are:
 - Teach how to identify optimizable register-register moves
 - Teach how to identify slow LEA instructions (each subtarget defining its own
   concept of "slow" LEA).
 - Teach how to identify instructions that have undocumented false dependencies
   on the output registers on some processors only.

It is also (in my opinion) an elegant way to expose knowledge to both external
tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when
internally constructing the data dependency graph for a code region.

This new design feature is also an "opt-in" feature. Processor models don't have
to use the new STIPredicates. It has all been designed to be as unintrusive as
possible.

Differential Revision: https://reviews.llvm.org/D52174

llvm-svn: 342555
2018-09-19 15:57:45 +00:00
Sanjay Patel 4fd2e2a498 [DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add
This is an alternative to D37896. I don't see a way to decompose multiplies 
generically without a target hook to tell us when it's profitable. 

ARM and AArch64 may be able to remove some duplicate code that overlaps with 
this transform.

As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474

As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.

Differential Revision: https://reviews.llvm.org/D52195

llvm-svn: 342554
2018-09-19 15:57:40 +00:00
Alex Bradbury 79518b02cd [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have 
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they 
will see no functional change. Previously this hook returned bool, but it now 
returns AtomicExpansionKind.

This hook allows targets to select how a given cmpxchg is to be expanded. 
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.

See my associated RFC for more info on the motivation for this change 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D48130

llvm-svn: 342550
2018-09-19 14:51:42 +00:00
Oliver Stannard 0b835be7bb [ARM] Fix unwind information for floating point registers
Fixes the unwind information generated for floating-point registers.
Previously, all padding registers were assumed to be four bytes wide. Now, the
width of the register is used to specify the amount of padding.

Patch by Jackson Woodruff!

Differential revision: https://reviews.llvm.org/D51494

llvm-svn: 342545
2018-09-19 13:25:31 +00:00
Calixte Denizet 7413a43886 Verify commit access in fixing typo
llvm-svn: 342538
2018-09-19 11:26:20 +00:00
Alex Bradbury 21aea51e71 [RISCV] Codegen for i8, i16, and i32 atomicrmw with RV32A
Introduce a new RISCVExpandPseudoInsts pass to expand atomic 
pseudo-instructions after register allocation. This is necessary in order to 
ensure that register spills aren't introduced between LL and SC, thus breaking 
the forward progress guarantee for the operation. AArch64 does something 
similar for CmpXchg (though only at O0), and Mips is moving towards this 
approach (see D31287). See also [this mailing list 
post](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099490.html) from 
James Knight, which summarises the issues with lowering to ll/sc in IR or 
pre-RA.

See the [accompanying RFC 
thread](http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html) for an 
overview of the lowering strategy.

Differential Revision: https://reviews.llvm.org/D47882

llvm-svn: 342534
2018-09-19 10:54:22 +00:00
Hans Wennborg 4195eb1068 [COFF] Emit @feat.00 on 64-bit and set the CFG bit when emitting guardcf tables
The 0x800 bit in @feat.00 needs to be set in order to make LLD pick up
the .gfid$y table. I believe this is fine to set even if we don't emit
the instrumentation.

We haven't emitted @feat.00 on 64-bit before. I see that MSVC does emit
it, but I'm not entirely sure what the default value should be. I went
with zero since that seems as safe as not emitting the symbol in the
first place.

Differential Revision: https://reviews.llvm.org/D52235

llvm-svn: 342532
2018-09-19 09:58:30 +00:00
Thomas Lively ad7e9e9f60 [WebAssembly][NFC] Remove extra space in WebAssemblyInstrSIMD.td
llvm-svn: 342522
2018-09-19 00:54:20 +00:00
Matthias Braun 934be5fecf AArch64MacroFusion: Factor out some opcode handling code; NFC
llvm-svn: 342521
2018-09-19 00:23:37 +00:00
Matthias Braun 726e12cf0c ScheduleDAG: Cleanup dumping code; NFC
- Instead of having both `SUnit::dump(ScheduleDAG*)` and
  `ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around.
- Add `ScheduleDAG::dump()` and avoid code duplication in several
  places. Implement it for different ScheduleDAG variants.
- Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()`
  functions. They were only ever used for debug dumping and putting the
  function into ScheduleDAG is consistent with the `dumpNode()` change.

llvm-svn: 342520
2018-09-19 00:23:35 +00:00
Thomas Lively aaf4e2cbba [WebAssembly] v4f32.abs and v2f64.abs
Summary: implement lowering of @llvm.fabs for vector types.

Reviewers: aheejin, dschuff

Subscribers:

llvm-svn: 342513
2018-09-18 21:45:12 +00:00
Farhana Aleen f5a2848376 [AMDGPU] Match udot8 pattern
Summary: D.u32 = S0.u4[0] * S1.u4[0] +

         S0.u4[1] * S1.u4[1] +
         S0.u4[2] * S1.u4[2] +
         S0.u4[3] * S1.u4[3] +
         S0.u4[4] * S1.u4[4] +
         S0.u4[5] * S1.u4[5] +
         S0.u4[6] * S1.u4[6] +
         S0.u4[7] * S1.u4[7] +
         S2.u32

Author: FarhanaAleen

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D51947

llvm-svn: 342497
2018-09-18 16:59:48 +00:00
Alex Bradbury 68f73c1206 [RISCV][MC] Use a custom ParserMethod for the bare_symbol operand type
This allows the hard-coded shouldForceImmediate logic to be removed because 
the generated MatchOperandParserImpl makes use of the current context (i.e. 
the current mnemonic) to determine parsing behaviour, and so won't first try 
to parse a register before parsing a symbol name.

No functional change is intended. gas accepts immediate arguments for call, 
tail and lla. This patch doesn't address this discrepancy.

Differential Revision: https://reviews.llvm.org/D51733

llvm-svn: 342488
2018-09-18 15:18:16 +00:00
Alex Bradbury 7d0e18d0dd [RISCV][MC] Reject bare symbols for the simm12 operand type
addi a0, a0, foo and lw a0, foo(a0) and similar are now rejected. An explicit 
%lo and %pcrel_lo modifier is required. This matches gas behaviour.

llvm-svn: 342487
2018-09-18 15:13:29 +00:00
Alex Bradbury 74340f1805 [RISCV][MC] Tighten up checking of sybol operands to lui and auipc
Reject bare symbols and accept only %pcrel_hi(sym) for auipc and %hi(sym) for 
lui. Also test valid operand modifiers in rv32i-valid.s.

Note this is slightly stricter than gas, which will accept either %pcrel_hi or 
%hi for both lui and auipc.

Differential Revision: https://reviews.llvm.org/D51731

llvm-svn: 342486
2018-09-18 15:08:35 +00:00
Nemanja Ivanovic 87c31a6113 [PowerPC] Do not emit record-form rotates when record-form andi/andis suffices
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.

This patch increases the number of record-form rotates we eliminate by
more than 70%.

Differential revision: https://reviews.llvm.org/D44897

llvm-svn: 342478
2018-09-18 13:43:16 +00:00
Nemanja Ivanovic 6a39d32e66 [PowerPC] Optimize compares fed by ANDISo
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.

Differential revision: https://reviews.llvm.org/D51353

llvm-svn: 342472
2018-09-18 13:21:58 +00:00
Simon Pilgrim e9bf71e761 [X86][SSE] LowerShift - pull out repeated getTargetVShiftUniformOpcode calls. NFCI.
llvm-svn: 342462
2018-09-18 10:44:44 +00:00
David Green 85d6a55995 [AArch64] Attempt to parse more operands as expressions
This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef
to parse more complex expressions as relocatable operands. It is hopefully better than
the existing code which only handles Symbol +- Constant.

This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also
loosens the requirements on parsing addends in ld/st and mov's and adds a number of
tests.

Differential Revision: https://reviews.llvm.org/D51792

llvm-svn: 342455
2018-09-18 09:44:53 +00:00
Matt Arsenault ebf46143ea AMDGPU: Don't form fmed3 if it will require materialization
If there is a single use constant, it can be folded into the
min/max, but not into med3.

llvm-svn: 342443
2018-09-18 02:34:54 +00:00
QingShan Zhang f1b0b47b2d [PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, 
because we can still get same latency due to default values.

With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.

This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, 
then causing different scheduling or suboptimal scheduling further.

Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52040

llvm-svn: 342441
2018-09-18 02:05:18 +00:00
Matt Arsenault 9d49c449ec AMDGPU: Expand vector canonicalizes
llvm-svn: 342439
2018-09-18 01:51:33 +00:00
Volodymyr Sapsai 703ab84cf5 Revert "[ARM] Cleanup ARM CGP isSupportedValue"
This reverts r342395 as it caused error

> Argument value type does not match pointer operand type!
>   %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25
>  i8in function atomic_flag_test_and_set
> fatal error: error in backend: Broken function found, compilation aborted!

on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/

More details are available at https://reviews.llvm.org/D52080

llvm-svn: 342431
2018-09-18 00:11:55 +00:00
Simon Atanasyan 9265dca8b5 [mips] Fix MIPS N32 ABI triples support
Add support mips64(el)-linux-gnuabin32 triples, and set them to N32.
Debian architecture name mipsn32/mipsn32el are also added. Set
UseIntegratedAssembler for N32 if we can detect it.

Patch by YunQiang Su.

Differential revision: https://reviews.llvm.org/D51408

llvm-svn: 342416
2018-09-17 21:21:57 +00:00
Keno Fischer c8ccaed325 [X86ISel] Implement byval lowering for Win64 calling convention
Summary:
The IR reference for the `byval` attribute states:

```
This indicates that the pointer parameter should really be passed by value
to the function. The attribute implies that a hidden copy of the pointee is
made between the caller and the callee, so the callee is unable to modify
the value in the caller. This attribute is only valid on LLVM pointer arguments.
```

However, on Win64, this attribute is unimplemented and the raw pointer is
passed to the callee instead. This is problematic, because frontend authors
relying on the implicit hidden copy (as happens for every other calling
convention) will see the passed value silently (if mutable memory) or
loudly (by means of a crash) modified because the callee treats the
location as scratch memory space it is allowed to mutate.

At this point, it's worth taking a step back to understand the context.
In most calling conventions, aggregates that are too large to be passed
in registers, instead get *copied* to the stack at a fixed (computable
from the signature) offset of the stack pointer. At the LLVM, we hide
this hidden copy behind the byval attribute. The caller passes a pointer
to the desired data and the callee receives a pointer, but these pointers
are not the same. In particular, the pointer that the callee receives
points to temporary stack memory allocated as part of the call lowering.
In most calling conventions, this pointer is never realized in registers
or memory. The temporary memory is simply defined by an implicit
offset from the stack pointer at function entry.

Win64, uniquely, works differently. The structure is still passed in
memory, but instead of being stored at an implicit memory offset, the
caller computes a pointer to the temporary memory and passes it to
the callee as a regular pointer (taking up a register, or if all
registers are taken up, an additional stack slot). Presumably, this
was done to allow eliding the copy when passing aggregates through
several functions on the stack.

This explains why ignoring the `byval` attribute mostly works on Win64.
The argument simply gets passed as a pointer and as long as we're ok
with the callee trampling all over that memory, there are no ill effects.
However, it does contradict the documentation of the `byval` attribute
which specifies that there is to be an implicit copy.

Frontends can of course work around this by never emitting the `byval`
attribute for Win64 and creating `alloca`s for the requisite temporary
stack slots (and that does appear to be what frontends are doing).
However, the presence of the `byval` attribute is not a trap for
frontend authors, since it seems to work, but silently modifies the
passed memory contrary to documentation.

I see two solutions:
- Disallow the `byval` attribute in the verifier if using the Win64
  calling convention.
- Make it work by simply emitting a temporary stack copy as we would
  with any other calling convention (frontends can of course always
  not use the attribute if they want to elide the copy).

This patch implements the second option (make it work), though I would
be fine with the first also.

Ref: https://github.com/JuliaLang/julia/issues/28338

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51842

llvm-svn: 342402
2018-09-17 17:37:14 +00:00
Stanislav Mekhanoshin 06d3b4139e [AMDGPU] Initialize instruction itinerary from GCNSubtarget
I need to use it in the GCN codegen.

Differential Revision: https://reviews.llvm.org/D52123

llvm-svn: 342400
2018-09-17 16:04:32 +00:00
Sam Parker 481cdab919 [ARM] Cleanup ARM CGP isSupportedValue
isSupportedValue explicitly checked and accepted many types of value,
primarily for debugging reasons. Remove most of these checks and do a
bit of refactoring now that the pass is more stable. This also enables
ZExts to be sources, but this has very little practical benefit at the
moment extend instructions will still be introduced.

Differential Revision: https://reviews.llvm.org/D52080

llvm-svn: 342395
2018-09-17 13:57:39 +00:00
Sam Parker 76d25d7f55 [ARM] Disallow icmp with negative imm and overflow
We allow overflowing instructions if they're decreasing and only used
by an unsigned compare. Add the extra condition that the icmp cannot
be using a negative immediate.

Differential Revision: https://reviews.llvm.org/D52102

llvm-svn: 342392
2018-09-17 13:48:25 +00:00