Commit Graph

33364 Commits

Author SHA1 Message Date
Reid Kleckner f12c030f48 [WinEH] Add 32-bit SEH state table emission prototype
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.

The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.

llvm-svn: 239433
2015-06-09 21:42:19 +00:00
Chad Rosier cf90acc104 [AArch64] Remove an overly conservative check when generating store pairs.
Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.

Previously, the read of w1 (see below) prevented the formation of a stp.

        str      w0, [x2]
        ldr     w8, [x2, #8]
        add      w0, w8, w1
        str     w1, [x2, #4]
        ret

We now generate the following code.

        stp      w0, w1, [x2]
        ldr     w8, [x2, #8]
        add      w0, w8, w1
        ret

All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
Performance results for SPEC2K were within noise.

llvm-svn: 239432
2015-06-09 20:59:41 +00:00
Akira Hatanaka d9699bc7bd Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions
that was resetting it.

Remove the uses of DisableTailCalls in subclasses of TargetLowering and use
the value of function attribute "disable-tail-calls" instead. Also,
unconditionally add pass TailCallElim to the pipeline and check the function
attribute at the start of runOnFunction to disable the pass on a per-function
basis. 
 
This is part of the work to remove TargetMachine::resetTargetOptions, and since
DisableTailCalls was the last non-fast-math option that was being reset in that
function, we should be able to remove the function entirely after the work to
propagate IR-level fast-math flags to DAG nodes is completed.

Out-of-tree users should remove the uses of DisableTailCalls and make changes
to attach attribute "disable-tail-calls"="true" or "false" to the functions in
the IR.

rdar://problem/13752163

Differential Revision: http://reviews.llvm.org/D10099

llvm-svn: 239427
2015-06-09 19:07:19 +00:00
Samuel Antao cd50135a29 The constant initialization for globals in NVPTX is generated as an
array of bytes. The generation of this byte arrays was expecting 
the host to be little endian, which prevents big endian hosts to be 
used in the generation of the PTX code. This patch fixes the 
problem by changing the way the bytes are extracted so that it 
works for either little and big endian.

llvm-svn: 239412
2015-06-09 16:29:34 +00:00
Toma Tabacu 465acfd13c Recommit "[mips] [IAS] Restore STI.FeatureBits in .set pop." (r239144).
Specified the llvm namespace for the 2 calls to make_unique() which caused
compilation errors in Visual Studio 2013.

llvm-svn: 239405
2015-06-09 13:33:26 +00:00
Elena Demikhovsky 6b62b659cb X86-MPX: Implemented encoding for MPX instructions.
Added encoding tests.

llvm-svn: 239403
2015-06-09 13:02:10 +00:00
Aaron Ballman 3182ee92ba Removing spurious semi colons; NFC.
llvm-svn: 239399
2015-06-09 12:03:46 +00:00
Toma Tabacu 7977cfd52a Revert "[mips] [IAS] Add support for BNE and BEQ with an immediate operand." (r239396).
It was breaking buildbots.

llvm-svn: 239397
2015-06-09 10:43:49 +00:00
Toma Tabacu 5fa8fb5762 [mips] [IAS] Add support for BNE and BEQ with an immediate operand.
Summary:
For some branches, GAS accepts an immediate instead of the 2nd register operand.
We only implement this for BNE and BEQ for now. Other branch instructions can be added later, if needed.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D9666

llvm-svn: 239396
2015-06-09 10:34:31 +00:00
Daniel Sanders 329fc9b68a [nvptx] Only support the 'm' inline assembly memory constraint. NFC.
Summary:
NVPTX doesn't seem to support any additional constraints. Therefore remove
the target hook.

No functional change intended.

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8209

llvm-svn: 239395
2015-06-09 10:34:05 +00:00
Matt Arsenault 5881f4e1e4 R600: Switch to using generic min / max nodes.
llvm-svn: 239377
2015-06-09 00:52:37 +00:00
Matt Arsenault 8b643559d4 MC: Add target hook to control symbol quoting
llvm-svn: 239370
2015-06-09 00:31:39 +00:00
Jingyue Wu 2e4d1dd0ed [NVPTX] run SROA after NVPTXFavorNonGenericAddrSpaces
Summary:
This cleans up most allocas NVPTXLowerKernelArgs emits for byval
parameters.

Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store.

Reviewers: eliben, jholewinski

Reviewed By: eliben, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10322

llvm-svn: 239368
2015-06-09 00:05:56 +00:00
Reid Kleckner b7403336ce [WinEH] Cache declarations of frame intrinsics
llvm-svn: 239361
2015-06-08 22:43:32 +00:00
Reid Kleckner 218a9593db Fix clang-cl self-host -Wc++11-narrowing bug
Use unsigned as the underlying storage type of the AMDGPU address space
enum.

llvm-svn: 239355
2015-06-08 21:57:57 +00:00
Ranjeet Singh 10511a493e [AArch64] AsmParser should be case insensitive about accepting vector register names.
Differential Revision: http://reviews.llvm.org/D10320

llvm-svn: 239353
2015-06-08 21:32:16 +00:00
Keno Fischer e70b31fc1b [InstrInfo] Refactor foldOperandImpl to thread through InsertPt. NFC
Summary:
This was a longstanding FIXME and is a necessary precursor to cases
where foldOperandImpl may have to create more than one instruction
(e.g. to constrain a register class). This is the split out NFC changes from
D6262.

Reviewers: pete, ributzka, uweigand, mcrosier

Reviewed By: mcrosier

Subscribers: mcrosier, ted, llvm-commits

Differential Revision: http://reviews.llvm.org/D10174

llvm-svn: 239336
2015-06-08 20:09:58 +00:00
Akira Hatanaka 4a61619ff5 [ARM] Pass a callback to FunctionPass constructors to enable skipping execution
on a per-function basis.

Previously some of the passes were conditionally added to ARM's pass pipeline
based on the target machine's subtarget. This patch makes changes to add those
passes unconditionally and execute them conditonally based on the predicate
functor passed to the pass constructors. This enables running different sets of
passes for different functions in the module.

rdar://problem/20542263

Differential Revision: http://reviews.llvm.org/D8717

llvm-svn: 239325
2015-06-08 18:50:43 +00:00
Pete Cooper 4915dd076f Remove includes of MCMachOSymbolFlags.h after it was deleted
llvm-svn: 239318
2015-06-08 17:25:57 +00:00
Matthias Braun 6f8db0e1a7 X86: Reject register operands with obvious type mismatches.
While we have some code to transform specification like {ax} into
{eax}/{rax} if the operand type isn't 16bit, we should reject cases
where there is no sane way to do this, like the i128 type in the
example.

Related to rdar://21042280

Differential Revision: http://reviews.llvm.org/D10260

llvm-svn: 239309
2015-06-08 16:56:23 +00:00
Colin LeMahieu 6aca6f0be5 [Hexagon] Adding functionality for searching for compound instruction pairs. Compound instructions reduce slot resource requirements freeing those packet slots up for more instructions.
llvm-svn: 239307
2015-06-08 16:34:47 +00:00
Javed Absar e1c7dc3ee2 ARM]: Add support for MMFR4_EL1 in assembler
This patch adds support for system register MMFR4_EL1 (memory model feature register) in the assembler.
This register provides information about the implemented memory model and memory management support.

llvm-svn: 239302
2015-06-08 15:01:11 +00:00
Igor Breger 00d9f8457b AVX-512: Implemented 256/128bit VALIGND/Q instructions for SKX and KNL
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.

Differential Revision: http://reviews.llvm.org/D10310

llvm-svn: 239300
2015-06-08 14:03:17 +00:00
Simon Pilgrim 3a7718038d [X86] Added BitScanForward/BitScanReverse memory folding + tests
llvm-svn: 239257
2015-06-07 18:34:25 +00:00
Rafael Espindola f3d49b30b5 Handle 16 bit PC relative relocations.
Fixes pr23771.

llvm-svn: 239214
2015-06-06 02:29:56 +00:00
Peter Collingbourne 6679fc1a79 Revert r238473, "Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM."
as it caused miscompilations and assertion failures (PR23768,
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150601/280380.html).

llvm-svn: 239169
2015-06-05 18:01:28 +00:00
Alexei Starovoitov 8cf9a4c472 [bpf] rename triple names bpf_be -> bpfeb
llvm-svn: 239162
2015-06-05 16:11:14 +00:00
Colin LeMahieu be8c453d58 [Hexagon] Reapply r239097 with tests corrected for shuffling and duplexing.
llvm-svn: 239161
2015-06-05 16:00:11 +00:00
Benjamin Kramer 113b2a943f [ARM] Make helper function static.
This one had a declaration but it differed from the definition so the
declaration was actually dead.

llvm-svn: 239157
2015-06-05 14:32:54 +00:00
John Brawn 985c04e8fa [ARM] Add support for -sp- FPUs and FPU none to TargetParser
These are added mainly for the benefit of clang, but this also means that they
are now allowed in .fpu directives and we emit the correct .fpu directive when
single-precision-only is used.

Differential Revision: http://reviews.llvm.org/D10238

llvm-svn: 239151
2015-06-05 13:31:19 +00:00
John Brawn d03d22922d [ARM] Add knowledge of FPU subtarget features to TargetParser
Add getFPUFeatures to TargetParser, which gets the list of subtarget features
that are enabled/disabled for each FPU, and use it when handling the .fpu
directive.

No functional change in this commit, though clang will start behaving
differently once it starts using this.

Differential Revision: http://reviews.llvm.org/D10237

llvm-svn: 239150
2015-06-05 13:29:24 +00:00
Toma Tabacu 399a56d771 Revert "[mips] [IAS] Restore STI.FeatureBits in .set pop." (r239144).
This is breaking the Windows buildbots.

llvm-svn: 239145
2015-06-05 12:19:27 +00:00
Toma Tabacu 89ebf88ff3 [mips] [IAS] Restore STI.FeatureBits in .set pop.
Summary:
Only restoring AvailableFeatures is not enough and will lead to buggy behaviour.
For example, if we have a feature enabled and we ".set pop", the next time we try
to ".set" that feature nothing will happen because the "!(STI.getFeatureBits()[Feature])"
check will be false, because we didn't restore STI.FeatureBits.

In order to fix this, we need to make MipsAssemblerOptions remember the STI.FeatureBits
instead of the AvailableFeatures and then regenerate AvailableFeatures each time we ".set pop".
This is because, AFAIK, there is no way to convert from AvailableFeatures back to STI.FeatureBits,
but the reverse is possible by using ComputeAvailableFeatures(STI.FeatureBits).

I also moved the updating of AssemblerOptions inside the "if" statement in
setFeatureBits() and clearFeatureBits(), as there is no reason to update if
nothing changes.

Reviewers: dsanders, mkuper

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9156

llvm-svn: 239144
2015-06-05 11:48:54 +00:00
Jim Grosbach 56ed0bb111 MC: Clean up the naming for MCMachObjectWriter. NFC.
s/ExecutePostLayoutBinding/executePostLayoutBinding/
s/ComputeSymbolTable/computeSymbolTable/
s/BindIndirectSymbols/bindIndirectSymbols/
s/RecordTLVPRelocation/recordTLVPRelocation/
s/RecordScatteredRelocation/recordScatteredRelocation/
s/WriteLinkerOptionsLoadCommand/writeLinkerOptionsLoadCommand/
s/WriteLinkeditLoadCommand/writeLinkeditLoadCommand/
s/WriteNlist/writeNlist/
s/WriteDysymtabLoadCommand/writeDysymtabLoadCommand/
s/WriteSymtabLoadCommand/writeSymtabLoadCommand/
s/WriteSection/writeSection/
s/WriteSegmentLoadCommand/writeSegmentLoadCommand/
s/WriteHeader/writeHeader/

llvm-svn: 239119
2015-06-04 23:25:54 +00:00
Charles Davis da280728b6 [Target/X86] Don't use callee-saved registers in a Win64 tail call on non-Windows.
Summary:
A small bit that I missed when I updated the X86 backend to account for
the Win64 calling convention on non-Windows. Now we don't use dead
non-volatile registers when emitting a Win64 indirect tail call on
non-Windows.

Should fix PR23710.

Test Plan: Added test for the correct behavior based on the case I posted to PR23710.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10258

llvm-svn: 239111
2015-06-04 22:50:05 +00:00
Jim Grosbach 36e60e9127 MC: Clean up naming in MCObjectWriter. NFC.
s/WriteObject/writeObject/
s/RecordRelocation/recordRelocation/
s/IsSymbolRefDifferenceFullyResolved/isSymbolRefDifferenceFullyResolved/
s/Write8/write8/
s/WriteLE16/writeLE16/
s/WriteLE32/writeLE32/
s/WriteLE64/writeLE64/
s/WriteBE16/writeBE16/
s/WriteBE32/writeBE32/
s/WriteBE64/writeBE64/
s/Write16/write16/
s/Write32/write32/
s/Write64/write64/
s/WriteZeroes/writeZeroes/
s/WriteBytes/writeBytes/

llvm-svn: 239108
2015-06-04 22:24:41 +00:00
Colin LeMahieu c40be85adc Revert r239095 incorrect test tree.
llvm-svn: 239102
2015-06-04 21:32:42 +00:00
Jingyue Wu a2f6027a31 [NVPTX] roll forward r239082
NVPTXISelDAGToDAG translates "addrspacecast to param" to
NVPTX::nvvm_ptr_gen_to_param

Added an llc test in bug21465.

llvm-svn: 239100
2015-06-04 21:28:26 +00:00
Colin LeMahieu f99fe00afc [Hexagon] Removing unused variable.
llvm-svn: 239097
2015-06-04 21:22:12 +00:00
Colin LeMahieu fc52c11d80 [Hexagon] Adding functionality for duplexing. Duplexing is a way to compress commonly used pairs of instructions in order to reduce code size. The test case duplex.ll normally would be 8 bytes, assign register to 0 and jump to link register. After duplexing this is only 4 bytes. This also tests the HexagonMCShuffler code path which is used to make sure duplexed instructions still follow slot requirements.
llvm-svn: 239095
2015-06-04 21:16:16 +00:00
Jingyue Wu b8f38668d5 Revert r239082
llc crashed for NVPTX backend

llvm-svn: 239094
2015-06-04 21:07:08 +00:00
Ahmed Bougacha 8207641251 [GlobalMerge] Take into account minsize on Global users' parents.
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.

Differential Revision: http://reviews.llvm.org/D10054

llvm-svn: 239087
2015-06-04 20:39:23 +00:00
Jim Grosbach 7c76b4cc6e MC: Remove obsolete MachO UseAggressiveSymbolFolding.
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.

rdar://21201804

llvm-svn: 239084
2015-06-04 20:27:42 +00:00
Jingyue Wu f3a8079b75 [NVPTX] kernel pointer arguments point to the global address space
Summary:
With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a
pointer in the global address space. This change, along with
NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.*
and st.global.* for accessing kernel pointer arguments.

Minor changes:
1. refactor: extract function convertToPointerInAddrSpace
2. fix a bug in the test case in bug21465.ll

Test Plan: lower-kernel-ptr-arg.ll

Reviewers: eliben, meheff, jholewinski

Reviewed By: jholewinski

Subscribers: wengxt, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10154

llvm-svn: 239082
2015-06-04 20:19:38 +00:00
Alexei Starovoitov 310deada10 [bpf] add big- and host- endian support
Summary:
-march=bpf    -> host endian
-march=bpf_le -> little endian
-match=bpf_be -> big endian

Test Plan:
v1 was tested by IBM s390 guys and appears to be working there.
It bit rots too fast here.

Reviewers: chandlerc, tstellarAMD

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10177

llvm-svn: 239071
2015-06-04 19:15:05 +00:00
Matt Arsenault 73e06fa262 R600/SI: Reimplement isLegalAddressingMode
Now that we sometimes know the address space, this can
theoretically do a better job.

This needs better test coverage, but this mostly depends on
first updating the loop optimizatiosn to provide the address
space.

llvm-svn: 239053
2015-06-04 16:17:42 +00:00
Matt Arsenault 81c7ae2bf5 R600/SI: Fix some cases for load / store of half
Mostly argument loads were producing broken zextloads
from an FP type.

llvm-svn: 239049
2015-06-04 16:00:27 +00:00
Benjamin Kramer 50e2a29385 Replace custom fixed endian to raw_ostream emission with EndianStream.
Less code, clearer and more efficient. No functionality change intended.

llvm-svn: 239040
2015-06-04 15:03:02 +00:00
Daniel Sanders 7813ae879e Replace string GNU Triples with llvm::Triple in MCAsmInfo subclasses and create*AsmInfo(). NFC.
Summary:
This is the first of several patches to eliminate StringRef forms of GNU
triples from the internals of LLVM. After this is complete, GNU triples
will be replaced by a more authoratitive representation in the form of
an LLVM TargetTuple.

Reviewers: rengolin

Reviewed By: rengolin

Subscribers: ted, llvm-commits, rengolin, jholewinski

Differential Revision: http://reviews.llvm.org/D10236

llvm-svn: 239036
2015-06-04 13:12:25 +00:00
Elena Demikhovsky 2f1a0dabd0 AVX-512: I brought back vector-shuffle-512-v8.ll test.
I re-generated it after all AVX-512 shuffle optimizations.

llvm-svn: 239026
2015-06-04 07:49:56 +00:00
Elena Demikhovsky 4078c75bd4 AVX-512: added all SKX forms of VPERMW/D/Q instructions.
Added all forms of VPERMPS/PD instrcuctions.
Added encoding tests.

llvm-svn: 239016
2015-06-04 07:07:13 +00:00
Elena Demikhovsky 214335d703 Removed {}, NFC.
llvm-svn: 239014
2015-06-04 07:01:29 +00:00
Rafael Espindola 8c006ee385 Bring back r239006 with a fix.
The fix is just that getOther had not been updated for packing the st_other
values in fewer bits and could return spurious values:

-  unsigned Other = (getFlags() & (0x3f << ELF_STO_Shift)) >> ELF_STO_Shift;
+  unsigned Other = (getFlags() & (0x7 << ELF_STO_Shift)) >> ELF_STO_Shift;

Original message:

Pack the MCSymbolELF bit fields into MCSymbol's Flags.

This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.

While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.

llvm-svn: 239012
2015-06-04 05:59:23 +00:00
Rafael Espindola a86ecee52b Revert "Pack the MCSymbolELF bit fields into MCSymbol's Flags."
This reverts commit r239006.

I am debugging the powerpc failures.

llvm-svn: 239010
2015-06-04 05:00:12 +00:00
Rafael Espindola d31203ae21 Pack the MCSymbolELF bit fields into MCSymbol's Flags.
This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64.

While at it, also make getOther/setOther easier to use by accepting unshifted
STO_* values.

llvm-svn: 239006
2015-06-04 02:32:20 +00:00
Sanjay Patel 667a7e2a0f make reciprocal estimate code generation more flexible by adding command-line options (3rd try)
The first try (r238051) to land this was reverted due to ExecutionEngine build failure;
that was hopefully addressed by r238788.

The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure;
that was hopefully addressed by r238953.

This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 239001
2015-06-04 01:32:35 +00:00
Tom Stellard 1ba52feb96 R600: Re-enable sub-reg liveness
The bug in the R600 backend that this uncovered has been fixed.

llvm-svn: 238999
2015-06-04 01:20:04 +00:00
Rafael Espindola f8794ff29d Remove MCELFSymbolFlags.h. It is now internal to MCSymbolELF.
llvm-svn: 238996
2015-06-04 00:47:43 +00:00
Rafael Espindola c73aed1cb3 Remove getOrCreateSymbolData. There is no MCSymbolData anymore.
llvm-svn: 238952
2015-06-03 19:03:11 +00:00
Colin LeMahieu 1ce7a11c9c [Hexagon] Test doesn't work on all platforms. At any rate the uninitialized variable issue was fixed. Removing re-registering ASM backend.
llvm-svn: 238949
2015-06-03 18:00:45 +00:00
Colin LeMahieu a675077310 [Hexagon] Reapply 238772 OSABI was not correctly set, added empty_elf test to make sure it is.
llvm-svn: 238947
2015-06-03 17:34:16 +00:00
Matthias Braun 125c9f5f7b ARM: Thumb2 LDRD/STRD supports independent input/output regs
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.

Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.

Differential Revision: http://reviews.llvm.org/D9694

Recommiting after the revert in r238821, the buildbot still failed with
the patch removed so there seems to be another reason for the breakage.

llvm-svn: 238935
2015-06-03 16:30:24 +00:00
Daniel Sanders 43a79bf694 [arm] Fix r238921. We must handle Constraint_i too.
llvm-svn: 238925
2015-06-03 14:17:18 +00:00
Asaf Badouh 402ebb34af re-apply 238809
AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.
CR:
http://reviews.llvm.org/D9991

llvm-svn: 238923
2015-06-03 13:41:48 +00:00
Daniel Sanders 1f58ef71ea [arm] Distinguish the /U[qytnms]/, 'Uv', 'Q', and 'm' inline assembly memory constraints.
Summary:
But still handle them the same way since I don't know how they differ on
this target.

Of these, /U[qytnms]/ do not have backend tests but are accepted by clang.

No functional change intended.

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D8203

llvm-svn: 238921
2015-06-03 12:33:56 +00:00
Elena Demikhovsky 86224fe468 AVX-512: More code improvements in shuffles, NFC
llvm-svn: 238919
2015-06-03 12:05:03 +00:00
Elena Demikhovsky 21de893377 AVX-512: VSHUFPD instruction selection - code improvements
llvm-svn: 238918
2015-06-03 11:21:01 +00:00
Elena Demikhovsky 9e38086534 AVX-512: Implemented SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2 instructions for SKX and KNL.
Added tests for encoding.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 238917
2015-06-03 10:56:40 +00:00
Elena Demikhovsky f7e641cc2d X86: Added MPX feature and bound registers.
Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake.
It is a part of KNL and SKX sets. It is also a part of Skylake client.

I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers.

llvm-svn: 238916
2015-06-03 10:30:57 +00:00
Simon Pilgrim 452252e6c8 [X86] Removed (unused) FSRL x86 operation
This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....).

Since the refactoring of the shuffle lowering code this no longer has any use.

Differential Revision: http://reviews.llvm.org/D10169

llvm-svn: 238906
2015-06-03 08:32:36 +00:00
Rafael Espindola cf8beece97 Revert "make reciprocal estimate code generation more flexible by adding command-line options (2nd try)"
This reverts commit r238842.

It broke -DBUILD_SHARED_LIBS=ON build.

llvm-svn: 238900
2015-06-03 05:32:44 +00:00
Rafael Espindola 9aa3ab30a9 Avoid a call to getOrCreateSymbol when we already have the symbol.
llvm-svn: 238890
2015-06-03 00:02:40 +00:00
Rafael Espindola 0ccf9b71f3 Pass a MCSymbolELF to a few ELF only functions. NFC.
llvm-svn: 238868
2015-06-02 21:30:13 +00:00
Rafael Espindola 95fb9b93ed Merge MCELF.h into MCSymbolELF.h.
Now that we have a dedicated type for ELF symbol, these helper functions can
become member function of MCSymbolELF.

llvm-svn: 238864
2015-06-02 20:38:46 +00:00
Tim Northover 3f3a4d8503 AArch64: fix typo in SMIN far atomics and add tests
llvm-svn: 238858
2015-06-02 18:37:20 +00:00
Benjamin Kramer db220dbf02 Push constness through LoopInfo::isLoopHeader and clean it up a bit.
NFC.

llvm-svn: 238843
2015-06-02 15:28:27 +00:00
Sanjay Patel 6f031d848e make reciprocal estimate code generation more flexible by adding command-line options (2nd try)
The first try (r238051) to land this was reverted due to bot failures
that were hopefully addressed by r238788.

This patch adds a TargetRecip class for processing many recip codegen possibilities.
The class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other x86 CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 238842
2015-06-02 15:28:15 +00:00
Elena Demikhovsky 8938f5acca AVX-512: Implemented VRANGESD and VRANGESS instructions for SKX Implemented DAG lowering for all these forms.
Added tests for encoding.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 238834
2015-06-02 14:12:54 +00:00
Elena Demikhovsky 44a129c533 AVX-512: Shorten implementation of lowerV16X32VectorShuffle()
using lowerVectorShuffleWithSHUFPS() and other shuffle-helpers routines.
Added matching of VALIGN instruction.

llvm-svn: 238830
2015-06-02 13:43:18 +00:00
Vasileios Kalintiris bb698c7d5f [mips] Add support for dynamic stack realignment.
Summary:
With this change we are able to realign the stack dynamically, whenever it
contains objects with alignment requirements that are larger than the
alignment specified from the given ABI.

We have to use the $fp register as the frame pointer when we perform
dynamic stack realignment. In complex stack frames, with variably-sized
objects, we reserve additionally the callee-saved register $s7 as the
base pointer in order to reference locals.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8633

llvm-svn: 238829
2015-06-02 13:14:46 +00:00
Renato Golin 3a7bec86bd Revert "ARM: Thumb2 LDRD/STRD supports independent input/output regs"
This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot.

Since self-hosting issues with Clang are hard to investigate, I'm taking the
liberty to revert now, so we can investigate it offline.

llvm-svn: 238821
2015-06-02 11:47:30 +00:00
Vladimir Sukharev 5f6f60d942 [AArch64] Add v8.1a atomic instructions
Patch by: Tom Coxon

Reviewers: t.p.northover

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8501

llvm-svn: 238818
2015-06-02 10:58:41 +00:00
Toma Tabacu 2969650ecd [mips] [IAS] Add support for the .set softfloat/hardfloat directives.
Summary: These directives are used to set the current value of the SoftFloat feature.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, mpf

Differential Revision: http://reviews.llvm.org/D9074

llvm-svn: 238813
2015-06-02 09:48:04 +00:00
Elena Demikhovsky 3425c932da AVX-512: Implemented VFIXUPIMMSD and VFIXUPIMMSS instructions for KNL
Implemented DAG lowering for all these forms.
Added tests for encoding.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 238811
2015-06-02 08:28:57 +00:00
Asaf Badouh 8d897dd05f revert 238809
llvm-svn: 238810
2015-06-02 07:45:19 +00:00
Asaf Badouh 17de10f37e AVX-512: Implemented GETEXP instruction for KNL and SKX
Added rounding mode modifier for SQRTPS/PD
Added tests for encoding and intrinsics.

llvm-svn: 238809
2015-06-02 07:18:14 +00:00
Rafael Espindola a869576008 Create a MCSymbolELF.
This create a MCSymbolELF class and moves SymbolSize since only ELF
needs a size expression.

This reduces the size of MCSymbol from 56 to 48 bytes.

llvm-svn: 238801
2015-06-02 00:25:12 +00:00
Matthias Braun e20dc1cd3a ARM: Thumb2 LDRD/STRD supports independent input/output regs
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.

Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.

Differential Revision: http://reviews.llvm.org/D9694

llvm-svn: 238795
2015-06-01 23:27:08 +00:00
Matthias Braun 72b8f74813 AArch64: Use CMP;CCMP sequences for and/or/setcc trees.
Previously CCMP/FCCMP instructions were only used by the
AArch64ConditionalCompares pass for control flow. This patch uses them
for SELECT like instructions as well by matching patterns in ISelLowering.

PR20927, rdar://18326194

Differential Revision: http://reviews.llvm.org/D8232

llvm-svn: 238793
2015-06-01 22:31:17 +00:00
Alexei Starovoitov dadc97767f [bpf] fix build
fix breakage due to r238634

Patch by Vijay Subramanian.

llvm-svn: 238792
2015-06-01 22:24:36 +00:00
Matt Arsenault a0269b6d20 R600/SI: Don't hardcode pointer type
llvm-svn: 238789
2015-06-01 21:58:24 +00:00
Matthias Braun ec50fa6f8c ARMLoadStoreOptimizer: Fix doxygen comments; NFC
llvm-svn: 238784
2015-06-01 21:26:23 +00:00
Rafael Espindola b5815b4738 Revert "[Hexagon] Adding basic ELF relocation generation and testing advanced relaxation codepath."
This reverts commit r238748.

It broke the msan bot:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/4372/steps/check-llvm%20msan/logs/stdio

llvm-svn: 238772
2015-06-01 19:20:47 +00:00
Vasileios Kalintiris cbbf8e0a39 [mips][FastISel] Implement bswap.
Summary: Implement bswap intrinsic for MIPS FastISel. It's very different for misp32 r1/r2 .

Based on a patch by Reed Kotler.

Test Plan:
bswap1.ll
test-suite

Reviewers: dsanders, rkotler

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D7219

llvm-svn: 238760
2015-06-01 16:40:45 +00:00
Vasileios Kalintiris bdb91b31f0 [mips][FastISel] Implement intrinsics memset, memcopy & memmove.
Summary:
Implement the intrinsics memset, memcopy and memmove in MIPS FastISel.
Make some needed infrastructure fixes so that this can work.

Based on a patch by Reed Kotler.

Test Plan:
memtest1.ll
The patch passes test-suite for mips32 r1/r2 and at O0/O2

Reviewers: rkotler, dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D7158

llvm-svn: 238759
2015-06-01 16:36:01 +00:00
Vasileios Kalintiris 8fcb3986d0 [mips][FastISel] Implement srem/urem and sdiv/udiv instructions.
Summary: Implement the LLVM assembly urem/srem and sdiv/udiv instructions in MIPS FastISel.

Based on a patch by Reed Kotler.

Test Plan:
srem1.ll
div1.ll
test-suite at O0/O2 for mips32 r1/r2

Reviewers: dsanders, rkotler

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D7028

llvm-svn: 238757
2015-06-01 16:17:37 +00:00
Vasileios Kalintiris 127f894b55 [mips][FastISel] Implement the select statement for MIPS FastISel.
Summary: Implement the LLVM IR select statement for MIPS FastISelsel.

Based on a patch by Reed Kotler.

Test Plan:
"Make check" test included now.
Passes test-suite at O2/O0 mips32 r1/r2.

Reviewers: dsanders, rkotler

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D6774

llvm-svn: 238756
2015-06-01 15:56:40 +00:00
Vasileios Kalintiris 7f680e156e [mips][FastISel] Clobber HI0/LO0 registers in MUL instructions.
Summary:
The contents of the HI/LO registers are unpredictable after the execution of
the MUL instruction. In addition to implicitly defining these registers in the
MUL instruction definition, we have to mark those registers as dead too.

Without this the fast register allocator is running out of registers when the
MUL instruction is followed by another one that tries to allocate the AC0
register.

Based on a patch by Reed Kotler.

Reviewers: dsanders, rkotler

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D9825

llvm-svn: 238755
2015-06-01 15:48:09 +00:00
Rafael Espindola 7f7caf9167 Fix relocation selection for foo-. on mips.
This handles only the 32 bit case.

llvm-svn: 238751
2015-06-01 15:10:51 +00:00
Rafael Espindola ccb8d1a114 Simplify code, NFC.
llvm-svn: 238750
2015-06-01 14:58:29 +00:00
Colin LeMahieu a739a4b3c7 [Hexagon] Adding basic ELF relocation generation and testing advanced relaxation codepath.
llvm-svn: 238748
2015-06-01 14:51:26 +00:00
Elena Demikhovsky 67afb630e1 AVX-512: Optimized vector shuffle for v16f32 and v16i32 types.
llvm-svn: 238743
2015-06-01 13:26:18 +00:00
Luke Cheeseman 85fd06d389 Re-commit of r238201 with fix for building with shared libraries.
llvm-svn: 238739
2015-06-01 12:02:47 +00:00
Elena Demikhovsky 3582eb3b39 AVX-512: Implemented VRANGEPD and VRANGEPD instructions for SKX.
Implemented DAG lowering for all these forms.
Added tests for encoding.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 238738
2015-06-01 11:05:34 +00:00
Elena Demikhovsky 0c41088ebf AVX-512: Implemented vector shuffle lowering for v8i64 and v8f64 types.
I removed the vector-shuffle-512-v8.ll, it is auto-generated test, not valid any more.

llvm-svn: 238735
2015-06-01 09:49:53 +00:00
Elena Demikhovsky 75ede68793 AVX-512: added all forms of VPSHUFD and VPSHUFHW, VPSHUFLW
including encodings.

llvm-svn: 238729
2015-06-01 07:17:23 +00:00
Elena Demikhovsky 42c96d9c0a AVX-512: Implemented VFIXUPIMMPD and VFIXUPIMMPS instructions for KNL and SKX
Implemented DAG lowering for all these forms.
Added tests for encoding.

by Igor Breger (igor.breger@intel.com)

llvm-svn: 238728
2015-06-01 06:50:49 +00:00
Elena Demikhovsky dd68d0cb0f AVX-512: Fixed a bug in compress and expand intrinsics.
By Igor Breger (igor.breger@intel.com)

llvm-svn: 238724
2015-06-01 06:30:13 +00:00
Matt Arsenault bd7d80a4a6 Add address space argument to isLegalAddressingMode
This is important because of different addressing modes
depending on the address space for GPU targets.

This only adds the argument, and does not update
any of the uses to provide the correct address space.

llvm-svn: 238723
2015-06-01 05:31:59 +00:00
Rafael Espindola 5eb02e45e3 Simplify another function that doesn't fail.
llvm-svn: 238703
2015-06-01 00:27:26 +00:00
NAKAMURA Takumi 072a58a7fd ARMConstantIslandPass.cpp: Prune an empty \brief. [-Wdocumentation]
llvm-svn: 238697
2015-05-31 23:05:35 +00:00
Colin LeMahieu a97365b8e0 [Hexagon] Including raw_ostream for debug builds.
llvm-svn: 238695
2015-05-31 22:29:33 +00:00
Colin LeMahieu b819d3c465 [Hexagon] classes are actually structs.
llvm-svn: 238694
2015-05-31 22:18:42 +00:00
Colin LeMahieu b23c47bab3 [Hexagon] Adding MC packet shuffler.
llvm-svn: 238692
2015-05-31 21:57:09 +00:00
Tim Northover a603c4076c ARM: recommit r237590: allow jump tables to be placed as constant islands.
The original version didn't properly account for the base register
being modified before the final jump, so caused miscompilations in
Chromium and LLVM. I've fixed this and tested with an LLVM self-host
(I don't have the means to build & test Chromium).

The general idea remains the same: in pathological cases jump tables
can be too far away from the instructions referencing them (like other
constants) so they need to be movable.

Should fix PR23627.

llvm-svn: 238680
2015-05-31 19:22:07 +00:00
Colin LeMahieu b510fb38f5 [Hexagon] Adding override specifier and removing erroneous assertion
llvm-svn: 238664
2015-05-30 20:03:07 +00:00
Colin LeMahieu 86f218e7ec [Hexagon] Adding basic relaxation functionality.
llvm-svn: 238660
2015-05-30 18:55:47 +00:00
Simon Pilgrim f19ef9f741 Stripped trailing whitespace. NFC.
llvm-svn: 238654
2015-05-30 13:01:42 +00:00
Renato Golin 5d78c9ce58 Comment change. NFC
That comment misleads the current discussions in mentioned bug. Leave
the discussions to the bug. Also, adding a future change FIXME.

llvm-svn: 238653
2015-05-30 10:44:07 +00:00
Chandler Carruth cb58910ce8 [x86] Unify the horizontal adding used for popcount lowering taking the
best approach of each.

For vNi16, we use SHL + ADD + SRL pattern that seem easily the best.

For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases
there is a huge improvement with this in IACA's estimated throughput --
over 2x higher throughput!!!! -- but the measurements are too good to be
true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks
slightly faster, but I'm not sure I believe any of the measurements at
this point. Both are the exact same uops though. Hard to be confident of
anything past that.

If anyone wants to collect very detailed (Agner-level) timings with the
result of this patch, or with the i32 case replaced with SHL + ADD + SHl
+ ADD + SRL, I'd be very interested. Note that you'll need to test it on
both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as
I saw unique behavior in each of these buckets with IACA all of which
should be checked against measured performance.

But this patch is still a useful improvement by dropping duplicate work
and getting the much nicer PSADBW lowering for v2i64.

I'd still like to rephrase this in terms of generic horizontal sum. It's
a bit lame to have a special case of that just for popcount.

llvm-svn: 238652
2015-05-30 10:35:03 +00:00
Renato Golin 230d298320 [ARMTargetParser] Move IAS arch ext parser. NFC
The plan was to move the whole table into the already existing ArchExtNames
but some fields depend on a table-generated file, and we don't yet have this
feature in the generic lib/Support side.

Once the minimum target-specific table-generated files are available in a
generic fashion to these libraries, we'll have to keep it in the ASM parser.

llvm-svn: 238651
2015-05-30 10:30:02 +00:00
Chandler Carruth 11e6f8fed1 [x86] Split out the horizontal byte sum lowering component of the LUT
lowering into a helper function.

NFC.

llvm-svn: 238650
2015-05-30 09:46:16 +00:00
Chandler Carruth 9cc2516676 [x86] Replace the long spelling of getting a bitcast with the *much*
shorter one. NFC.

In addition to being much shorter to type and requiring fewer arguments,
this change saves over 30 lines from this one file, all wasted on total
boilerplate...

llvm-svn: 238640
2015-05-30 04:23:13 +00:00
Chandler Carruth 060cdca996 [x86] Replace the long spelling of getting a bitcast with the new short
spelling. NFC.

llvm-svn: 238639
2015-05-30 04:19:57 +00:00
Chandler Carruth 502b23a7a9 [sdag] Add the helper I most want to the DAG -- building a bitcast
around a value using its existing SDLoc.

Start using this in just one function to save omg lines of code.

llvm-svn: 238638
2015-05-30 04:14:10 +00:00
Chandler Carruth 2599da3cfd [x86] Restore the bitcasts I removed when refactoring this to avoid
shifting vectors of bytes as x86 doesn't have direct support for that.

This removes a bunch of redundant masking in the generated code for SSE2
and SSE3.

In order to avoid the really significant code size growth this would
have triggered, I also factored the completely repeatative logic for
shifting and masking into two lambdas which in turn makes all of this
much easier to read IMO.

llvm-svn: 238637
2015-05-30 04:05:11 +00:00
Chandler Carruth 6ba9730a4e [x86] Implement a faster vector population count based on the PSHUFB
in-register LUT technique.

Summary:
A description of this technique can be found here:
http://wm.ite.pl/articles/sse-popcount.html

The core of the idea is to use an in-register lookup table and the
PSHUFB instruction to compute the population count for the low and high
nibbles of each byte, and then to use horizontal sums to aggregate these
into vector population counts with wider element types.

On x86 there is an instruction that will directly compute the horizontal
sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily.
Various tricks are used to get vNi32 and vNi16 from the vNi8 that the
LUT computes.

The base implemantion of this, and most of the work, was done by Bruno
in a follow up to D6531. See Bruno's detailed post there for lots of
timing information about these changes.

I have extended Bruno's patch in the following ways:

0) I committed the new tests with baseline sequences so this shows
   a diff, and regenerated the tests using the update scripts.

1) Bruno had noticed and mentioned in IRC a redundant mask that
   I removed.

2) I introduced a particular optimization for the i32 vector cases where
   we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD
   + PSADBW to compute doubled high i32 popcounts. This takes advantage
   of the fact that to line up the high i32 popcounts we have to shift
   them anyways, and we can shift them by one fewer bit to effectively
   divide the count by two. While the PSHUFD based horizontal add is no
   faster, it doesn't require registers or load traffic the way a mask
   would, and provides more ILP as it happens on different ports with
   high throughput.

3) I did some code cleanups throughout to simplify the implementation
   logic.

4) I refactored it to continue to use the parallel bitmath lowering when
   SSSE3 is not available to preserve the performance of that version on
   SSE2 targets where it is still much better than scalarizing as we'll
   still do a bitmath implementation of popcount even in scalar code
   there.

With #1 and #2 above, I analyzed the result in IACA for sandybridge,
ivybridge, and haswell. In every case I measured, the throughput is the
same or better using the LUT lowering, even v2i64 and v4i64, and even
compared with using the native popcnt instruction! The latency of the
LUT lowering is often higher than the latency of the scalarized popcnt
instruction sequence, but I think those latency measurements are deeply
misleading. Keeping the operation fully in the vector unit and having
many chances for increased throughput seems much more likely to win.

With this, we can lower every integer vector popcount implementation
using the LUT strategy if we have SSSE3 or better (and thus have
PSHUFB). I've updated the operation lowering to reflect this. This also
fixes an issue where we were scalarizing horribly some AVX lowerings.

Finally, there are some remaining cleanups. There is duplication between
the two techniques in how they perform the horizontal sum once the byte
population count is computed. I'm going to factor and merge those two in
a separate follow-up commit.

Differential Revision: http://reviews.llvm.org/D10084

llvm-svn: 238636
2015-05-30 03:20:59 +00:00
Chandler Carruth c2e400de83 [x86] Restructure the parallel bitmath lowering of popcount into
a separate routine, generalize it to work for all the integer vector
sizes, and do general code cleanups.

This dramatically improves lowerings of byte and short element vector
popcount, but more importantly it will make the introduction of the
LUT-approach much cleaner.

The biggest cleanup I've done is to just force the legalizer to do the
bitcasting we need. We run these iteratively now and it makes the code
much simpler IMO. Other changes were minor, and mostly naming and
splitting things up in a way that makes it more clear what is going on.

The other significant change is to use a different final horizontal sum
approach. This is the same number of instructions as the old method, but
shifts left instead of right so that we can clear everything but the
final sum with a single shift right. This seems likely better than
a mask which will usually have to read the mask from memory. It is
certaily fewer u-ops. Also, this will be temporary. This and the LUT
approach share the need of horizontal adds to finish the computation,
and we have more clever approaches than this one that I'll switch over
to.

llvm-svn: 238635
2015-05-30 03:20:55 +00:00
Jim Grosbach 13760bd152 MC: Clean up MCExpr naming. NFC.
llvm-svn: 238634
2015-05-30 01:25:56 +00:00
Reid Kleckner e6531a5588 [WinEH] Adjust the 32-bit SEH prologue to better match reality
It turns out that _except_handler3 and _except_handler4 really use the
same stack allocation layout, at least today. They just make different
choices about encoding the LSDA.

This is in preparation for lowering the llvm.eh.exceptioninfo().

llvm-svn: 238627
2015-05-29 22:57:46 +00:00
Reid Kleckner 173a72524f Disable FP elimination in funcs using 32-bit MSVC EH personalities
The value in 'ebp' acts as an implicit argument to the outlined
handlers, and is recovered with frameaddress(1).

llvm-svn: 238619
2015-05-29 21:58:11 +00:00
Rafael Espindola 4d37b2a259 Remove getData.
This completes the mechanical part of merging MCSymbol and MCSymbolData.

llvm-svn: 238617
2015-05-29 21:45:01 +00:00
Reid Kleckner 5b8ebfbc25 Only add the EH state insertion pass on 32-bit Windows
llvm-svn: 238612
2015-05-29 20:43:10 +00:00
Rafael Espindola beb6060a51 Remove the MCSymbolData typedef.
The getData member function is next.

llvm-svn: 238611
2015-05-29 20:41:47 +00:00
Rafael Espindola b5d316bfc3 Rename getOrCreateSymbolData to registerSymbol and return void.
Another step in merging MCSymbol and MCSymbolData.

llvm-svn: 238607
2015-05-29 20:21:02 +00:00
Rafael Espindola e3b2acf274 Pass MCSymbols to the helper functions in MCELF.h.
llvm-svn: 238596
2015-05-29 18:47:23 +00:00
Rafael Espindola ece40ca43d Pass a MCSymbol to needsRelocateWithSymbol.
llvm-svn: 238589
2015-05-29 18:26:09 +00:00
Nemanja Ivanovic 376e17364f Add support for VSX FMA single-precision instructions to the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D9941

It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.

llvm-svn: 238578
2015-05-29 17:13:25 +00:00
Reid Kleckner 1d3d4adbb9 [WinEH] Emit EH tables for __CxxFrameHandler3 on 32-bit x86
Small (really small!) C++ exception handling examples work on 32-bit x86
now.

This change disables the use of .seh_* directives in WinException when
CFI is not in use. It also uses absolute symbol references in the tables
instead of imagerel32 relocations.

Also fixes a cache invalidation bug in MMI personality classification.

llvm-svn: 238575
2015-05-29 17:00:57 +00:00
Jingyue Wu 995dde2799 [NVPTXFavorNonGenericAddrSpaces] recursively trace into GEP and BitCast
Summary:
This patch allows NVPTXFavorNonGenericAddrSpaces to remove addrspacecast
from longer chains consisting of GEPs and BitCasts. For example, it can
now optimize

  %0 = addrspacecast [10 x float] addrspace(3)* @a to [10 x float]*
  %1 = gep [10 x float]* %0, i64 0, i64 %i
  %2 = bitcast float* %1 to i32*
  %3 = load i32* %2 ; emits ld.u32

to

  %0 = gep [10 x float] addrspace(3)* @a, i64 0, i64 %i
  %1 = bitcast float addrspace(3)* %0 to i32 addrspace(3)*
  %3 = load i32 addrspace(3)* %1 ; emits ld.shared.f32

Test Plan: @ld_int_from_global_float in access-non-generic.ll

Reviewers: broune, eliben, jholewinski, meheff

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10074

llvm-svn: 238574
2015-05-29 17:00:27 +00:00
Colin LeMahieu 68d967d92e [Hexagon] Disassembling, printing, and emitting instructions a whole-bundle at a time which is the semantic unit for Hexagon. Fixing tests to use the new format. Disabling tests in the direct object emission path for a followup patch.
llvm-svn: 238556
2015-05-29 14:44:13 +00:00
Toma Tabacu b45fb36f20 [mips] Remove 2 unused variables in MipsTargetStreamer.cpp. NFC.
llvm-svn: 238554
2015-05-29 13:52:56 +00:00
Matthias Braun e41e146c16 CodeGen: Use mop_iterator instead of MIOperands/ConstMIOperands
MIOperands/ConstMIOperands are classes iterating over the MachineOperand
of a MachineInstr, however MachineInstr::mop_iterator does the same
thing.

I assume these two iterators exist to have a uniform interface to
iterate over the operands of a machine instruction bundle and a single
machine instruction. However in practice I find it more confusing to have 2
different iterator classes, so this patch transforms (nearly all) the
code to use mop_iterators.

The only exception being MIOperands::anlayzePhysReg() and
MIOperands::analyzeVirtReg() still needing an equivalent, I leave that
as an exercise for the next patch.

Differential Revision: http://reviews.llvm.org/D9932

This version is slightly modified from the proposed revision in that it
introduces MachineInstr::getOperandNo to avoid the extra counting
variable in the few loops that previously used MIOperands::getOperandNo.

llvm-svn: 238539
2015-05-29 02:56:46 +00:00
Reid Kleckner fe4d491bd9 [WinEH] Start inserting state number stores for C++ EH
This moves all the state numbering code for C++ EH to WinEHPrepare so
that we can call it from the X86 state numbering IR pass that runs
before isel.

Now we just call the same state numbering machinery and insert a bunch
of stores. It also populates MachineModuleInfo with information about
the current function.

llvm-svn: 238514
2015-05-28 22:00:24 +00:00
Rafael Espindola 3a5d3cce80 Remove a trivial forwarding function. NFC.
llvm-svn: 238506
2015-05-28 21:36:02 +00:00
Reid Kleckner bfcad2f181 Remove debug prints from r238487
llvm-svn: 238501
2015-05-28 21:23:53 +00:00
Reid Kleckner 80956a0142 Disable x86 tail call optimizations that jump through GOT
For x86 targets, do not do sibling call optimization when materializing
the callee's address would require a GOT relocation. We can still do
tail calls to internal functions, hidden functions, and protected
functions, because they do not require this kind of relocation. It is
still possible to get GOT relocations when the user explicitly asks for
it with musttail or -tailcallopt, both of which are supposed to
guarantee TCO.

Based on a patch by Chih-hung Hsieh.

Reviewers: srhines, timmurray, danalbert, enh, void, nadav, rnk

Subscribers: joerg, davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D9799

llvm-svn: 238487
2015-05-28 20:44:28 +00:00
Peter Collingbourne 450fbee6b2 Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing these as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach
the register allocator to allocate the registers in ascending order. This
never got implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achiveing better codegen is to create LDM/STM
instructions with identical sets of virtual registers, let the register
allocator pick arbitrary registers and order register lists when printing an
MCInst. This approach also avoids the need to repeatedly calculate offsets
which ultimately ought to be eliminated pre-RA in order to decrease register
pressure.

This is implemented by lowering the memcpy intrinsic to a series of SD-only
MCOPY pseudo-instructions which performs a memory copy using a given number
of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a
little unusual, but it avoids the need to encode register lists in the SD,
and we can take advantage of SD use lists to decide whether to use the _UPD
variant of the instructions.

Fixes PR9199.

Differential Revision: http://reviews.llvm.org/D9508

llvm-svn: 238473
2015-05-28 20:02:45 +00:00
Reid Kleckner e2e57faa7d [WinEH] Remove debugging dump() call
llvm-svn: 238472
2015-05-28 20:02:05 +00:00
Chad Rosier adc06311ba Reuse Loc variable. NFC.
llvm-svn: 238448
2015-05-28 18:18:21 +00:00