Now the library names in the Makefiles match the library names in
LLVMBuild.txt.
This should hopefully fix the remaining bot failures.
llvm-svn: 239661
r213101 changed the behaviour of this method to not only affect the
PostMachineScheduler scheduler but also the PostRAScheduler scheduler,
renaming should make this fact clear. Also document that the preferred
way is to specify this in the scheduling model instead of overriding
this method.
Differential Revision: http://reviews.llvm.org/D10427
llvm-svn: 239659
This will use Itinieraries if available, but will also work if just a
MCSchedModel is available.
Differential Revision: http://reviews.llvm.org/D10428
llvm-svn: 239658
- Add glc, slc, and tfe operands to flat instructions
- Add missing flat instructions
- Fix the encoding of flat_load_dwordx3 and flat_store_dwordx3.
llvm-svn: 239637
The alignment is not required, so we can just remove it for now.
The old code is a hack as it depends on the buffer management to find
the current column.
If the alignment is really desirable, the proper way to do it is
to pass in a formatted_raw_stream that knows the current column.
llvm-svn: 239603
We were putting them in the filter field, which is correct for 64-bit
but wrong for 32-bit.
Also switch the order of scope table entry emission so outermost entries
are emitted first, and fix an obvious state assignment bug.
llvm-svn: 239574
Remove the EFLAGS from the stackmap live-out mask. The EFLAGS register is not
supposed to be part of that set, because the X86 calling conventions mark the
register as NOT preserved.
Also remove the IP registers, since spilling and restoring those doesn't really
make any sense.
Related to rdar://problem/21019635.
llvm-svn: 239568
This intrinsic is like framerecover plus a load. It recovers the EH
registration stack allocation from the parent frame and loads the
exception information field out of it, giving back a pointer to an
EXCEPTION_POINTERS struct. It's designed for clang to use in SEH filter
expressions instead of accessing the EXCEPTION_POINTERS parameter that
is available on x64.
This required a minor change to MC to allow defining a label variable to
another absolute framerecover label variable.
llvm-svn: 239567
Summary:
For the moment, TargetMachine::getTargetTriple() still returns a StringRef.
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: ted, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10362
llvm-svn: 239554
Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
Revert "Fixing MSVC 2013 build error."
The test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.
llvm-svn: 239544
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, jfb, rengolin
Differential Revision: http://reviews.llvm.org/D10361
llvm-svn: 239538
This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:
1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.
The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.
Tested on SSE2, SSE41 and AVX machines.
Differential Revision: http://reviews.llvm.org/D9474
llvm-svn: 239509
This patch corresponds to review:
http://reviews.llvm.org/D10096
This is the back end portion of the patch related to D10095.
The patch adds the instructions and back end intrinsics for:
vbpermq
vgbbd
llvm-svn: 239505
This reverts commit r239437.
This broke clang-cl self-hosts. We'd end up calling the __imp_ symbol
directly instead of using it to do an indirect function call.
llvm-svn: 239502
It hasn't been used since r130964.
This also removes MachineModuleInfo::isUsedFunction and
MachineModuleInfo::AnalyzeModule, both of which were only
there to support UsedFunctions.
llvm-svn: 239501
This is a reimplementation of D9780 at the machine instruction level rather than the DAG.
Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a
starting point; see the TODO comments) to increase ILP when it's safe to do so.
The code is closely based on the existing MachineCombiner optimization that is implemented
for AArch64.
This patch should not cause the kind of spilling tragedy that led to the reversion of r236031.
Differential Revision: http://reviews.llvm.org/D10321
llvm-svn: 239486
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10311
llvm-svn: 239467
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10307
llvm-svn: 239465
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: echristo, rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10243
llvm-svn: 239464
Use a "safeseh" string attribute to do this. You would think we chould
just accumulate the set of personalities like we do on dwarf, but this
fails to account for the LSDA-loading thunks we use for
__CxxFrameHandler3. Each of those needs to make it into .sxdata as well.
The string attribute seemed like the most straightforward approach.
llvm-svn: 239448
Summary:
We used to assume V->RAUW only modifies the operand list of V's user.
However, if V and V's user are Constants, RAUW may replace and invalidate V's
user entirely.
This patch fixes the above issue by letting the caller replace the
operand instead of calling RAUW on Constants.
Test Plan: @nested_const_expr and @rauw in access-non-generic.ll
Reviewers: broune, jholewinski
Reviewed By: broune, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10345
llvm-svn: 239435
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.
The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.
llvm-svn: 239433
Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.
Previously, the read of w1 (see below) prevented the formation of a stp.
str w0, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
str w1, [x2, #4]
ret
We now generate the following code.
stp w0, w1, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
ret
All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
Performance results for SPEC2K were within noise.
llvm-svn: 239432
that was resetting it.
Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis.
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.
Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.
rdar://problem/13752163
Differential Revision: http://reviews.llvm.org/D10099
llvm-svn: 239427
array of bytes. The generation of this byte arrays was expecting
the host to be little endian, which prevents big endian hosts to be
used in the generation of the PTX code. This patch fixes the
problem by changing the way the bytes are extracted so that it
works for either little and big endian.
llvm-svn: 239412
Summary:
For some branches, GAS accepts an immediate instead of the 2nd register operand.
We only implement this for BNE and BEQ for now. Other branch instructions can be added later, if needed.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D9666
llvm-svn: 239396
Summary:
This cleans up most allocas NVPTXLowerKernelArgs emits for byval
parameters.
Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store.
Reviewers: eliben, jholewinski
Reviewed By: eliben, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10322
llvm-svn: 239368
Summary:
This was a longstanding FIXME and is a necessary precursor to cases
where foldOperandImpl may have to create more than one instruction
(e.g. to constrain a register class). This is the split out NFC changes from
D6262.
Reviewers: pete, ributzka, uweigand, mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, ted, llvm-commits
Differential Revision: http://reviews.llvm.org/D10174
llvm-svn: 239336
on a per-function basis.
Previously some of the passes were conditionally added to ARM's pass pipeline
based on the target machine's subtarget. This patch makes changes to add those
passes unconditionally and execute them conditonally based on the predicate
functor passed to the pass constructors. This enables running different sets of
passes for different functions in the module.
rdar://problem/20542263
Differential Revision: http://reviews.llvm.org/D8717
llvm-svn: 239325
While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10260
llvm-svn: 239309
This patch adds support for system register MMFR4_EL1 (memory model feature register) in the assembler.
This register provides information about the implemented memory model and memory management support.
llvm-svn: 239302
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.
Differential Revision: http://reviews.llvm.org/D10310
llvm-svn: 239300
These are added mainly for the benefit of clang, but this also means that they
are now allowed in .fpu directives and we emit the correct .fpu directive when
single-precision-only is used.
Differential Revision: http://reviews.llvm.org/D10238
llvm-svn: 239151
Add getFPUFeatures to TargetParser, which gets the list of subtarget features
that are enabled/disabled for each FPU, and use it when handling the .fpu
directive.
No functional change in this commit, though clang will start behaving
differently once it starts using this.
Differential Revision: http://reviews.llvm.org/D10237
llvm-svn: 239150
Summary:
Only restoring AvailableFeatures is not enough and will lead to buggy behaviour.
For example, if we have a feature enabled and we ".set pop", the next time we try
to ".set" that feature nothing will happen because the "!(STI.getFeatureBits()[Feature])"
check will be false, because we didn't restore STI.FeatureBits.
In order to fix this, we need to make MipsAssemblerOptions remember the STI.FeatureBits
instead of the AvailableFeatures and then regenerate AvailableFeatures each time we ".set pop".
This is because, AFAIK, there is no way to convert from AvailableFeatures back to STI.FeatureBits,
but the reverse is possible by using ComputeAvailableFeatures(STI.FeatureBits).
I also moved the updating of AssemblerOptions inside the "if" statement in
setFeatureBits() and clearFeatureBits(), as there is no reason to update if
nothing changes.
Reviewers: dsanders, mkuper
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9156
llvm-svn: 239144
Summary:
A small bit that I missed when I updated the X86 backend to account for
the Win64 calling convention on non-Windows. Now we don't use dead
non-volatile registers when emitting a Win64 indirect tail call on
non-Windows.
Should fix PR23710.
Test Plan: Added test for the correct behavior based on the case I posted to PR23710.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10258
llvm-svn: 239111
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.
Differential Revision: http://reviews.llvm.org/D10054
llvm-svn: 239087
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.
rdar://21201804
llvm-svn: 239084
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.
Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll
Test Plan: lower-kernel-ptr-arg.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: wengxt, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10154
llvm-svn: 239082
Summary:
-march=bpf -> host endian
-march=bpf_le -> little endian
-match=bpf_be -> big endian
Test Plan:
v1 was tested by IBM s390 guys and appears to be working there.
It bit rots too fast here.
Reviewers: chandlerc, tstellarAMD
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10177
llvm-svn: 239071
Now that we sometimes know the address space, this can
theoretically do a better job.
This needs better test coverage, but this mostly depends on
first updating the loop optimizatiosn to provide the address
space.
llvm-svn: 239053
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: ted, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10236
llvm-svn: 239036
The fix is just that getOther had not been updated for packing the st_other
values in fewer bits and could return spurious values:
- unsigned Other = (getFlags() & (0x3f << ELF_STO_Shift)) >> ELF_STO_Shift;
+ unsigned Other = (getFlags() & (0x7 << ELF_STO_Shift)) >> ELF_STO_Shift;
Original message:
Pack the MCSymbolELF bit fields into MCSymbol's Flags.
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
llvm-svn: 239012
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.
While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.
llvm-svn: 239006
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.
The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 239001
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
Recommiting after the revert in r238821, the buildbot still failed with
the patch removed so there seems to be another reason for the breakage.
llvm-svn: 238935
AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.
CR:
http://reviews.llvm.org/D9991
llvm-svn: 238923
Summary:
But still handle them the same way since I don't know how they differ on
this target.
Of these, /U[qytnms]/ do not have backend tests but are accepted by clang.
No functional change intended.
Reviewers: t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D8203
llvm-svn: 238921
Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake.
It is a part of KNL and SKX sets. It is also a part of Skylake client.
I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers.
llvm-svn: 238916
This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....).
Since the refactoring of the shuffle lowering code this no longer has any use.
Differential Revision: http://reviews.llvm.org/D10169
llvm-svn: 238906
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.
This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238842
Summary:
With this change we are able to realign the stack dynamically, whenever it
contains objects with alignment requirements that are larger than the
alignment specified from the given ABI.
We have to use the $fp register as the frame pointer when we perform
dynamic stack realignment. In complex stack frames, with variably-sized
objects, we reserve additionally the callee-saved register $s7 as the
base pointer in order to reference locals.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8633
llvm-svn: 238829
This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot.
Since self-hosting issues with Clang are hard to investigate, I'm taking the
liberty to revert now, so we can investigate it offline.
llvm-svn: 238821
Summary: These directives are used to set the current value of the SoftFloat feature.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, mpf
Differential Revision: http://reviews.llvm.org/D9074
llvm-svn: 238813
This create a MCSymbolELF class and moves SymbolSize since only ELF
needs a size expression.
This reduces the size of MCSymbol from 56 to 48 bytes.
llvm-svn: 238801
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.
Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.
Differential Revision: http://reviews.llvm.org/D9694
llvm-svn: 238795
Previously CCMP/FCCMP instructions were only used by the
AArch64ConditionalCompares pass for control flow. This patch uses them
for SELECT like instructions as well by matching patterns in ISelLowering.
PR20927, rdar://18326194
Differential Revision: http://reviews.llvm.org/D8232
llvm-svn: 238793
Summary: Implement bswap intrinsic for MIPS FastISel. It's very different for misp32 r1/r2 .
Based on a patch by Reed Kotler.
Test Plan:
bswap1.ll
test-suite
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7219
llvm-svn: 238760
Summary:
Implement the intrinsics memset, memcopy and memmove in MIPS FastISel.
Make some needed infrastructure fixes so that this can work.
Based on a patch by Reed Kotler.
Test Plan:
memtest1.ll
The patch passes test-suite for mips32 r1/r2 and at O0/O2
Reviewers: rkotler, dsanders
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7158
llvm-svn: 238759
Summary: Implement the LLVM assembly urem/srem and sdiv/udiv instructions in MIPS FastISel.
Based on a patch by Reed Kotler.
Test Plan:
srem1.ll
div1.ll
test-suite at O0/O2 for mips32 r1/r2
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D7028
llvm-svn: 238757
Summary: Implement the LLVM IR select statement for MIPS FastISelsel.
Based on a patch by Reed Kotler.
Test Plan:
"Make check" test included now.
Passes test-suite at O2/O0 mips32 r1/r2.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D6774
llvm-svn: 238756
Summary:
The contents of the HI/LO registers are unpredictable after the execution of
the MUL instruction. In addition to implicitly defining these registers in the
MUL instruction definition, we have to mark those registers as dead too.
Without this the fast register allocator is running out of registers when the
MUL instruction is followed by another one that tries to allocate the AC0
register.
Based on a patch by Reed Kotler.
Reviewers: dsanders, rkotler
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D9825
llvm-svn: 238755