Commit Graph

189040 Commits

Author SHA1 Message Date
Quentin Colombet d5e57b731f [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224351
2014-12-16 19:09:03 +00:00
Kevin Enderby adb7c43c40 Fix the arm build bots for a test that was added. A printing routine was incorrectly using PRIx32
when it should have been using PRIx64 for the value that was passed as uint64_t .

llvm-svn: 224350
2014-12-16 18:58:11 +00:00
Robert Khasanov d04cd2fbfe [AVX512] Enable integer arithmetic lowering for AVX512BW/VL subsets.
Added lowering tests.

llvm-svn: 224349
2014-12-16 18:24:07 +00:00
Evgeny Astigeevich b42003d2bf On behalf of Matthew Wahab:
An instruction alias defined with InstAlias and an optional operand in the
middle of the AsmString field, "..${a} <operands>", would get the final
"}" printed in the instruction disassembly. This wouldn't happen if the optional
operand appeared as the last item in the AsmString which is how the current
backends avoided the problem.

There don't appear to be any tests for this part of Tablegen but it passes the
pre-commit tests. Manually tested the change by enabling the generic alias
printer in the ARM backend and checking the output.

Differential Revision: http://reviews.llvm.org/D6529

llvm-svn: 224348
2014-12-16 18:16:17 +00:00
Ahmed Bougacha 0dc1979293 [MC] Reset the MCInst in the matcher function before adding opcode/operands.
On X86, the Intel asm parser tries to match all memory operand sizes when
none is explicitly specified.  For LEA, which doesn't really have a memory
operand (just a pointer one), this results in multiple successful matches,
one for each memory size.  There's no error because it's same opcode, so
really, it's just one match.  However, the tablegen'd matcher function
adds opcode/operands to the passed MCInst, and this results in multiple
duplicated operands.

This commit clears the MCInst in the tablegen'd matcher function.
We sometimes clear it when the match failed, so there's no expectation of
keeping the previous content anyway.

Differential Revision: http://reviews.llvm.org/D6670

llvm-svn: 224347
2014-12-16 18:05:28 +00:00
Colin LeMahieu 1944a8cd04 [Hexagon] Adding absolute value, and negate with saturation
llvm-svn: 224346
2014-12-16 17:44:49 +00:00
Zachary Turner 3e9ca3a625 Delete MSVC intermediate files on "make clean" from tests.
lld-link shells out to MSVC for certain types of work, and this
results in some MSVC output files being generated even though
clang / lld are the compiler / linker.  This is expected, so we
make sure to clean these output files on make clean.

llvm-svn: 224345
2014-12-16 16:48:19 +00:00
Sanjay Patel e46d54f0bf combine consecutive subvector 16-byte loads into one 32-byte load
This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ).
When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector,
we can use a single 32-byte load instead. 
But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops.
We also don't bother using 32-byte *integer* loads on a machine that only has AVX1 (btver2)
because those operands would have to be split in half anyway since there is no support for
32-byte integer math ops.

Differential Revision: http://reviews.llvm.org/D6492

llvm-svn: 224344
2014-12-16 16:30:01 +00:00
Colin LeMahieu 455f24aa77 [Hexagon] Adding saturate and swizzle instructions.
llvm-svn: 224343
2014-12-16 16:27:17 +00:00
Marshall Clow d60fe11919 Re-commit the test for regex that I busted last night - now passes under ASAN
llvm-svn: 224342
2014-12-16 16:22:43 +00:00
Robert Khasanov 8d9b93eac8 [AVX512] Add a comment for avx512_broadcast_pat multiclass
llvm-svn: 224341
2014-12-16 16:12:11 +00:00
Colin LeMahieu d9b23509bf [Hexagon] Removing old multiply defs and updating references to new versions.
llvm-svn: 224340
2014-12-16 16:10:01 +00:00
Vladimir Medic e88609388a The single check for N64 inside MipsDisassemblerBase's subclasses is actually wrong. It should be testing for FeatureGP64bit.There are no functional changes.
llvm-svn: 224339
2014-12-16 15:29:12 +00:00
Zoran Jovanovic 2deca34803 [mips][microMIPS] Implement SWP and LWP instructions
Differential Revision: http://reviews.llvm.org/D5667

llvm-svn: 224338
2014-12-16 14:59:10 +00:00
Aaron Ballman 0d6a010c13 Fixing -Wsign-compare warnings; NFC.
llvm-svn: 224337
2014-12-16 14:04:11 +00:00
Vladimir Medic a489a631ae Add disassembler tests for mips4 platform. There are no functional changes.
llvm-svn: 224335
2014-12-16 13:02:25 +00:00
Elena Demikhovsky f5b72afff4 Masked Load and Store Intrinsics in loop vectorizer.
The loop vectorizer optimizes loops containing conditional memory
accesses by generating masked load and store intrinsics.
This decision is target dependent.

http://reviews.llvm.org/D6527

llvm-svn: 224334
2014-12-16 11:50:42 +00:00
Daniel Sanders 97797a8a0f [mips] Fix arguments-struct.ll for Windows and OSX hosts.
llvm-svn: 224333
2014-12-16 11:21:58 +00:00
Bradley Smith ececb7f6e2 [ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanes
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 224332
2014-12-16 10:59:27 +00:00
Hafiz Abid Qadeer 547cb78774 Fixed 2 typos in help.
llvm-svn: 224331
2014-12-16 10:20:35 +00:00
Elena Demikhovsky a79fc16bb0 X86: Added FeatureVectorUAMem for all AVX architectures.
According to AVX specification:

"Most arithmetic and data processing instructions encoded using the VEX prefix and
performing memory accesses have more flexible memory alignment requirements
than instructions that are encoded without the VEX prefix. Specifically,
With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions,
most VEX-encoded, arithmetic and data processing instructions operate in
a flexible environment regarding memory address alignment, i.e. VEX-encoded
instruction with 32-byte or 16-byte load semantics will support unaligned load
operation by default. Memory arguments for most instructions with VEX prefix
operate normally without causing #GP(0) on any byte-granularity alignment
(unlike Legacy SSE instructions)."

The same for AVX-512.

This change does not affect anything right now, because only the "memop pattern fragment"
depends on FeatureVectorUAMem and it is not used in AVX patterns.
All AVX patterns are based on the "unaligned load" anyway.

llvm-svn: 224330
2014-12-16 09:10:08 +00:00
Alexey Bataev 07649fb7c5 Renamed RefersToEnclosingLocal bitfield to RefersToCapturedVariable.
Bitfield RefersToEnclosingLocal of Stmt::DeclRefExprBitfields renamed to RefersToCapturedVariable to reflect latest changes introduced in commit 224323. Also renamed method Expr::refersToEnclosingLocal() to Expr::refersToCapturedVariable() and comments for constant arguments.
No functional changes.

llvm-svn: 224329
2014-12-16 08:01:48 +00:00
Duncan P. N. Exon Smith 8c66273d0c Remove 'metadata' from comments
llvm-svn: 224328
2014-12-16 07:45:05 +00:00
Duncan P. N. Exon Smith bb7d2fb1e5 IR: Stop printing 'metadata' in Metadata::print()
Stop printing `metadata` in `Metadata::print()` and
`Metadata::printAsOperand()`.

llvm-svn: 224327
2014-12-16 07:40:31 +00:00
Mohit K. Bhakkad a94a037528 internal_stat for mips64
llvm-svn: 224326
2014-12-16 07:11:08 +00:00
Duncan P. N. Exon Smith fee167fcc0 IR: Make MDNode::dump() useful by adding addresses
It's horrible to inspect `MDNode`s in a debugger.  All of their operands
that are `MDNode`s get dumped as `<badref>`, since we can't assign
metadata slots in the context of a `Metadata::dump()`.  (Why not?  Why
not assign numbers lazily?  Because then each time you called `dump()`,
a given `MDNode` could have a different lazily assigned number.)

Fortunately, the C memory model gives us perfectly good identifiers for
`MDNode`.  Add pointer addresses to the dumps, transforming this:

    (lldb) e N->dump()
    !{i32 662302, i32 26, <badref>, null}

    (lldb) e ((MDNode*)N->getOperand(2))->dump()
    !{i32 4, !"foo"}

into:

    (lldb) e N->dump()
    !{i32 662302, i32 26, <0x100706ee0>, null}

    (lldb) e ((MDNode*)0x100706ee0)->dump()
    !{i32 4, !"foo"}

and this:

    (lldb) e N->dump()
    0x101200248 = !{<badref>, <badref>, <badref>, <badref>, <badref>}

    (lldb) e N->getOperand(0)
    (const llvm::MDOperand) $0 = {
      MD = 0x00000001012004e0
    }
    (lldb) e N->getOperand(1)
    (const llvm::MDOperand) $1 = {
      MD = 0x00000001012004e0
    }
    (lldb) e N->getOperand(2)
    (const llvm::MDOperand) $2 = {
      MD = 0x0000000101200058
    }
    (lldb) e N->getOperand(3)
    (const llvm::MDOperand) $3 = {
      MD = 0x00000001012004e0
    }
    (lldb) e N->getOperand(4)
    (const llvm::MDOperand) $4 = {
      MD = 0x0000000101200058
    }
    (lldb) e ((MDNode*)0x00000001012004e0)->dump()
    !{}

    (lldb) e ((MDNode*)0x0000000101200058)->dump()
    !{null}

into:

    (lldb) e N->dump()
    !{<0x1012004e0>, <0x1012004e0>, <0x101200058>, <0x1012004e0>, <0x101200058>}

    (lldb) e ((MDNode*)0x1012004e0)->dump()
    !{}

    (lldb) e ((MDNode*)0x101200058)->dump()
    !{null}

llvm-svn: 224325
2014-12-16 07:09:37 +00:00
Duncan P. N. Exon Smith 68c37245d3 DebugInfo: Update testcase to actually check something
This test was missing a `Debug Info Version` so it's `not grep` was
passing vacuously.  Update it to CHECK for something useful at the same
time so it doesn't bitrot quite so easily in the future.

llvm-svn: 224324
2014-12-16 07:08:19 +00:00
Alexey Bataev f841bd9fcd [OPENMP] Bugfix for processing of global variables in OpenMP regions.
Currently, if global variable is marked as a private OpenMP variable, the compiler crashes in debug version or generates incorrect code in release version. It happens because in the OpenMP region the original global variable is used instead of the generated private copy. It happens because currently globals variables are not captured in the OpenMP region.
This patch adds capturing of global variables iff private copy of the global variable must be used in the OpenMP region.
Differential Revision: http://reviews.llvm.org/D6259

llvm-svn: 224323
2014-12-16 07:00:22 +00:00
David Majnemer e7029bce82 Sema: Don't crash converting to bool from _Atomic
Turning our _Atomic L-value into an R-value removes its _Atomic-ness.
However, we didn't update our 'FromType' which made
ScalarTypeToBooleanCastKind think we were trying to pass it a
non-scalar.

This fixes PR21836.

llvm-svn: 224322
2014-12-16 06:31:17 +00:00
Jason Molenda 0f479da711 Temporarily disable CompactUnwindInfo::GetCompactUnwindInfoForFunction.
The compact unwind importer is getting the wrong unwind info for one
case that I found.  I haven't been able to fix the problem tonight 
and I don't want to leave TOT behaving incorrectly, so just ignore
compact unwind until I can get to the bottom of this.

llvm-svn: 224321
2014-12-16 06:20:30 +00:00
Nick Lewycky f0202ca775 Improve handling of value dependent expressions in __attribute__((enable_if)), both in the condition expression and at the call site. Fixes PR20988!
llvm-svn: 224320
2014-12-16 06:12:01 +00:00
Saleem Abdulrasool 417fc6b303 ARM: diagnose deprecated syntax
The use of SP and PC in the register list for stores is deprecated on ARM
(ARM ARM A.8.8.199):

  ARM deprecates the use of ARM instructions that include the SP or the PC in
  the list.

Provide a deprecation warning from the assembler in the case that the syntax is
ever seen.

llvm-svn: 224319
2014-12-16 05:53:25 +00:00
Hal Finkel 8adf2254ef [PowerPC] Improve instruction selection bit-permuting operations (32-bit)
The PowerPC backend, somewhat embarrassingly, did not generate an
optimal-length sequence of instructions for a 32-bit bswap. While adding a
pattern for the bswap intrinsic to fix this would not have been terribly
difficult, doing so would not have addressed the real problem: we had been
generating poor code for many bit-permuting operations (by which I mean things
like byte-swap that permute the bits of one or more inputs around in various
ways). Here are some initial steps toward solving this deficiency.

Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL,
SHL, SRL, AND and OR (mostly with constant second operands). Looking back
through these operations, we can build up a description of the bits in the
resulting value in terms of bits of one or more input values (and constant
zeros). For each bit, we compute the rotation amount from the original value,
and then group consecutive (value, rotation factor) bits into groups. Groups
sharing these attributes are then collected and sorted, and we can then
instruction select the entire permutation using a combination of masked
rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts
(rlwimi).

The result is that instead of lowering an i32 bswap as:

	rlwinm 5, 3, 24, 16, 23
	rlwinm 4, 3, 24, 0, 7
	rlwimi 4, 3, 8, 8, 15
	rlwimi 5, 3, 8, 24, 31
	rlwimi 4, 5, 0, 16, 31

we now produce:

	rlwinm 4, 3, 8, 0, 31
	rlwimi 4, 3, 24, 16, 23
	rlwimi 4, 3, 24, 0, 7

and for the 'test6' example in the PowerPC/README.txt file:

 unsigned test6(unsigned x) {
   return ((x & 0x00FF0000) >> 16) | ((x & 0x000000FF) << 16);
 }

we used to produce:

	lis 4, 255
	rlwinm 3, 3, 16, 0, 31
	ori 4, 4, 255
	and 3, 3, 4

and now we produce:

	rlwinm 4, 3, 16, 24, 31
	rlwimi 4, 3, 16, 8, 15

and, as a nice bonus, this fixes the FIXME in
test/CodeGen/PowerPC/rlwimi-and.ll.

This commit does not include instruction-selection for i64 operations, those
will come later.

llvm-svn: 224318
2014-12-16 05:51:41 +00:00
Justin Bogner 1e8019f5b9 Revert "Fix installheaders target's permissions"
The install of headers excludes the support directory, so these chmod
calls fail on non-existent directories, as seen on this bot:

http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/2801/console

This reverts commit r224300.

llvm-svn: 224317
2014-12-16 05:28:07 +00:00
David Majnemer 9963adeaef AST: Fix the linkage of static vars in fn template specializations
We that static variables in function template specializations were
externally visible.  The manglers assumed that externally visible static
variables were numbered in Sema.  We would end up mangling static
variables in the same specialization with the same mangling number which
would give all of them the same name.

This fixes PR21904.

llvm-svn: 224316
2014-12-16 04:52:14 +00:00
Kuba Brecka 731089bbce Add an MACOS_VERSION_UNKNOWN_NEWER enum value for OS X versions above 10.10.
We recently had a broken version check because an newer OS X version is treated as MACOS_VERSION_UNKNOWN which is less than all the defined values. Let's have a separate enum value for unknown but newer versions, so the ">=" and "<=" version checks still work even in upcoming OS X releases.

Reviewed at http://reviews.llvm.org/D6137

llvm-svn: 224315
2014-12-16 04:46:15 +00:00
Saleem Abdulrasool 08408ea86e ARM: 80-column
clang-format a function with an overly long string constant.  NFC.

llvm-svn: 224314
2014-12-16 04:10:10 +00:00
Matthias Braun 1aed6ffa35 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224313
2014-12-16 04:03:38 +00:00
Rafael Espindola a23008ad4b Remove the last unnecessary member variable of mapped_file_region. NFC.
llvm-svn: 224312
2014-12-16 03:10:29 +00:00
Rafael Espindola 369d514616 Convert a member variable to a local variable. NFC.
llvm-svn: 224311
2014-12-16 02:53:35 +00:00
Enrico Granata 3cfc49f5e9 Instead of rolling our own, use the C++11 sanctioned solution
llvm-svn: 224310
2014-12-16 02:34:13 +00:00
Rafael Espindola 986f5adf8d Remove unused member and simplify. NFC.
llvm-svn: 224309
2014-12-16 02:19:26 +00:00
Alexey Samsonov 1c65001e5e Fix data symbolization with libbacktrace. Patch by Jakub Jelinek!
llvm-svn: 224308
2014-12-16 01:52:55 +00:00
Rafael Espindola 9d1020648c Start adding thin archive support.
This is just sufficient for 'ar t' to work.

llvm-svn: 224307
2014-12-16 01:43:41 +00:00
Greg Clayton a2162b3166 If a binary was stripped we sometimes didn't show the ivars of an Objective C class correctly. Now we do as we consult the runtime data for the class so we don't have to have a symbol in the symbol table.
Fixed:
1 - try the symbol table symbol for an ObjC ivar and use it if available
2 - fall back to using the runtime data since it is slower to gather via memory read
3 - Fixed our hidden ivars test case to test this to ensure we don't regress
4 - split out a test case in the hidden ivars to cover only the part that was failing so we don't have an expected failure for all of the other content in the test.

<rdar://problem/18882687>

llvm-svn: 224306
2014-12-16 01:33:17 +00:00
Alexey Samsonov bba821b5b1 [ASan] Allow to atomically modify malloc_context_size at runtime.
Summary:
Introduce __asan::malloc_context_size atomic that is used to determine
required malloc/free stack trace size. It is initialized with
common_flags()->malloc_context_size flag, but can later be overwritten
at runtime (e.g. when ASan is activated / deactivated).

Test Plan: regression test suite

Reviewers: kcc, eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6645

llvm-svn: 224305
2014-12-16 01:23:03 +00:00
Jonathan Roelofs ab534366fe Appease the c++14 buildbots
llvm-svn: 224304
2014-12-16 01:18:11 +00:00
Hans Wennborg 45810b4631 Clarify the code in checkDLLAttribute()
Update the comments to make it more clear what's going on, and address
Richard's comments from PR21718. This doesn't fix that bug, but hopefully
makes the code easier to understand.

llvm-svn: 224303
2014-12-16 01:15:01 +00:00
Kevin Enderby c971338def Fix a bug in llvm-objdump’s -private-headers for 32-bit Mach-O files
printing the section header.  And add some tests for this for 32-bit files.

llvm-svn: 224302
2014-12-16 01:14:45 +00:00
Marshall Clow e8a651b02d Comment out the breaking tests until I figure out what's going on here.
llvm-svn: 224301
2014-12-16 00:59:59 +00:00