Commit Graph

219435 Commits

Author SHA1 Message Date
Nicolai Haehnle 82fc962c20 AMDGPU/SI: Fold operands with sub-registers
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.

Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.

Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.

With the IR generated by current Mesa, the changes are obviously relatively
minor:

7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)

Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)

Reviewers: tstellarAMD, arsenm, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15875

llvm-svn: 257074
2016-01-07 17:10:29 +00:00
Nicolai Haehnle 3c05d6d3b5 AMDGPU/SI: xnack_mask is always reserved on VI
Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.

I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).

This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.

It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)

Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15898

llvm-svn: 257073
2016-01-07 17:10:20 +00:00
Rui Ueyama 7562d0e490 Fix typo.
They happened to be anagrams.

llvm-svn: 257072
2016-01-07 16:41:06 +00:00
Michael Zuckerman 5b1cad87aa [avx512] Fix test avx512bw-intrinsics.ll
Change the CHECK lablel into AVX512BW 
And fix declare lable of llvm.x86.avx512.mask.psrav32_hi 

llvm-svn: 257071
2016-01-07 16:25:42 +00:00
Michael Zuckerman 3aca221b31 [AVX512] add PSLLW and PSLLV Intrinsic
Differential Revision: http://reviews.llvm.org/D15889

llvm-svn: 257070
2016-01-07 16:02:51 +00:00
Silviu Baranga dd68d46ec1 Revert r257064. It caused failures in some sanitizer tests.
llvm-svn: 257069
2016-01-07 15:46:43 +00:00
Pavel Labath f6d9db4ae8 XFAIL TestMultithreaded on linux
Test sometimes fails even during the reruns, upgrading to xflaky to xfail.

llvm-svn: 257068
2016-01-07 15:24:51 +00:00
Silviu Baranga c67ec3f716 Fix build after r257064: we should be returning false, not nullptr
llvm-svn: 257067
2016-01-07 15:09:22 +00:00
Nico Weber 4324b9b236 Revert r257055, it caused PR26064.
llvm-svn: 257066
2016-01-07 15:01:46 +00:00
Samuel Antao 5812c20d09 [OpenMP] Fix issue in the offloading metadata testing.
- Allow device ID to be signed.
 - Add missing semicolon to some of the CHECK directives.

Thanks to Amjad Aboud for detecting the issue.

llvm-svn: 257065
2016-01-07 14:58:16 +00:00
Silviu Baranga 57b1b90996 [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257064
2016-01-07 14:56:08 +00:00
Michael Zuckerman 354152d590 [AVX512] add PSRAV Intrinsic
Differential Revision: http://reviews.llvm.org/D15856

llvm-svn: 257063
2016-01-07 14:42:20 +00:00
Daniel Jasper 54353dac75 clang-format: Support weird lambda macros.
Before:
  MACRO((AA & a) { return 1; });

After:
  MACRO((AA &a) { return 1; });

llvm-svn: 257062
2016-01-07 14:36:11 +00:00
Ewan Crawford 9272a1c5a2 Remove duplicate header added in r256927
r256927 included a duplicate StreamString header file. This patch simply removes the duplicate.

Author: Luke Drummond <luke.drummond@codeplay.com>
Differential Revision: http://reviews.llvm.org/D15948

llvm-svn: 257061
2016-01-07 14:34:52 +00:00
Amjad Aboud d7cfb48485 Added support for macro emission in dwarf (supporting DWARF version 4).
Differential Revision: http://reviews.llvm.org/D15495

llvm-svn: 257060
2016-01-07 14:28:20 +00:00
James Molloy 9971a6841c [GlobalsAA] Partially back out r248576
See PR25822 for a more full summary, but we were conflating the concepts of "capture" and "escape". We were proving nocapture and using that proof to infer noescape, which is not true. Escaped-ness is a function-local property - as soon as a value is used in a call argument it escapes. Capturedness is a related but distinct property. It implies a *temporally limited* escape. Consider:

  static int a;
  int b;
  int g(int * nocapture arg);
  int f() {
    a = 2;  // Even though a escapes to g, it is not captured so can be treated as non-escaping here.
    g(&a);  // But here it must be treated as escaping.
    g(&b);  // Now that g(&a) has returned we know it was not captured so we can treat it as non-escaping again.
  }

The original commit did not sufficiently understand this nuance and so caused PR25822 and PR26046.

r248576 included both a performance improvement (which has been backed out) and a related conformance fix (which has been kept along with its testcase).

llvm-svn: 257058
2016-01-07 13:33:28 +00:00
Daniel Jasper d337111f02 Add missing -no-canonical-prefixes.
llvm-svn: 257057
2016-01-07 12:53:59 +00:00
Michael Zuckerman a6df006b50 [AVX512] add PSHUFHW and PSHUFLW Intrinsic
Differential Revision: http://reviews.llvm.org/D15925

llvm-svn: 257056
2016-01-07 12:35:43 +00:00
Simon Pilgrim bcc11a059e [X86][AVX] Match broadcast loads through a bitcast
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.

Follow up to D15310

llvm-svn: 257055
2016-01-07 11:34:27 +00:00
Pavel Labath a203b6eb28 Remove some Windows->Android XTIMEOUTs
llvm-svn: 257052
2016-01-07 11:16:30 +00:00
Pavel Labath b08f8db32d XFAIL TestEvents.test_add_listener_to_broadcaster
Upgrade flaky to xfail, as the test sometimes fails even during the rerun.

llvm-svn: 257050
2016-01-07 10:53:40 +00:00
Dylan McKay 5c96de3ad7 Added AVRTargetObjectFile class and AVR.h
llvm-svn: 257049
2016-01-07 10:53:15 +00:00
Tamas Berghammer 904d5fe496 Mark arm as the 32bit variant of aarch64 in Triple
Change Triple::get32BitArchVariant to return arm/armeb as the 32bit
variant of aarch64/aarch64_be and do the same change for the oppoiste
direction in Triple::get64BitArchVariant.

Differential revision: http://reviews.llvm.org/D15529

llvm-svn: 257048
2016-01-07 10:41:12 +00:00
Junmo Park 1238610aa1 Remove extra whitespace. NFC.
llvm-svn: 257047
2016-01-07 10:26:32 +00:00
Simon Pilgrim 83e44c66ae [X86][SSE} Add INSERTPS as a target shuffle
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.

llvm-svn: 257046
2016-01-07 10:24:19 +00:00
Ewan Crawford 26e52a7074 [RenderScript] Improve file format for saving RS allocations
Updates the file format for storing RS allocations to a file, so that the format now supports struct element types.

The file header will now contain a subheader for every RS element and it's descendants. 
Where an element subheader contains element type details and offsets to the subheaders of that elements fields. 

Patch also improves robustness when loading incorrect files.

llvm-svn: 257045
2016-01-07 10:19:09 +00:00
Michael Zuckerman 4a1566827d [AVX512] add PSHUFD Intrinsic
Differential Revision: http://reviews.llvm.org/D15934

llvm-svn: 257044
2016-01-07 09:24:12 +00:00
Sergey Kalinichev b8d516a8d9 [libclang] Handle AutoType in clang_getTypeDeclaration
Differential Revision: http://reviews.llvm.org/D13001

llvm-svn: 257043
2016-01-07 09:20:40 +00:00
Kuba Brecka 490b7f8b6d Follow-up fix for r256988 to unbreak the Linux buildbot.
llvm-svn: 257042
2016-01-07 09:14:41 +00:00
Tim Northover a12d0a99b1 ARM: allow __thread on OS versions that have the required runtime support.
llvm-svn: 257041
2016-01-07 09:04:46 +00:00
Eric Christopher 3f07d85826 Make sure we claim arguments that are going to be passed to a gcc tool,
even if they're not going to be used to avoid unused option warnings.

llvm-svn: 257040
2016-01-07 09:03:42 +00:00
Tim Northover bd41cf880c ARM: support TLS accesses on Darwin platforms
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.

llvm-svn: 257039
2016-01-07 09:03:03 +00:00
Daniel Jasper efc1a83a99 clang-format: [JS] Support more ES6 imports.
Before:
  import a, {X, Y}
  from 'some/module.js';

After:
  import a, {X, Y} from 'some/module.js';

llvm-svn: 257038
2016-01-07 08:53:35 +00:00
Michael Liao a809acec40 Modernize to range-based loop
llvm-svn: 257037
2016-01-07 07:58:25 +00:00
Jonas Paulsson 3939b690f6 [SystemZ] Add hasSideEffects flag on Serialize instruction.
Serialize will perform a hardware serialization operation, and is
acting as a memory barrier. Therefore it must have the hasSideEffects
flag set so it will be treated as a global memory object.

Reviewed by Ulrich Weigand

llvm-svn: 257036
2016-01-07 07:20:55 +00:00
Craig Topper 68cffb17a0 [X86] Remove superfluous mayLoad flag. The pattern already implies it.
llvm-svn: 257035
2016-01-07 06:42:10 +00:00
Craig Topper 79e0ef82e8 [X86] Had hasSideEffects=0 to VBROADCASTI128.
llvm-svn: 257034
2016-01-07 06:37:55 +00:00
Craig Topper 04cc5d25c7 [X86] Add OpSize32 to MOVSX32_NOREX instructions to match their other versions.
llvm-svn: 257033
2016-01-07 06:37:52 +00:00
Craig Topper 0b165557b2 [X86] Add hasSideEffects=0 and mayLoad=1 to MOVZX64* instructions. While there remove a superfluous _Q from the instruction names.
llvm-svn: 257032
2016-01-07 05:57:39 +00:00
NAKAMURA Takumi 7e887bd80d llvm/test/CodeGen/X86/statepoint-vector.ll REQUIRES asserts due to a debug option.
llvm-svn: 257031
2016-01-07 05:40:37 +00:00
Craig Topper fc678ba944 [X86] STOSQ without a rep prefix doesn't read or write RCX.
llvm-svn: 257030
2016-01-07 05:18:49 +00:00
Tom Stellard 68a55e6c34 [ELF] Fix REQUIRES line in amdgpu tests
'REQUIRES' needs to be in all caps.

Also update lit.cfg so these tests get run when the AMDGPU target is
enabled.

llvm-svn: 257029
2016-01-07 05:02:38 +00:00
David Majnemer 0e90f46e10 Undo spurious change made in r256965
llvm-svn: 257028
2016-01-07 04:31:35 +00:00
Tom Stellard a09245cddb [ELF] Fix amdgpu tests
The triples in these tests were completely wrong, but the tests were
still passing for me locally.

llvm-svn: 257027
2016-01-07 04:22:24 +00:00
Philip Reames cffc628ca1 One more attempt at stablizing a test on all platforms.
llvm-svn: 257026
2016-01-07 04:20:52 +00:00
Philip Reames afdbcc6a84 [Statepoints] Add test cases around vectors and stablize test
Unlike my comment in 257022 said, it turns out we do handle constant vectors in the statepoint lowering, but only because SelectionDAG doesn't actually produce constants for them.  Add a couple of tests which show this working.

Also, add a triple to the same test file to hopefully fix a failing bot.

It turns out we do han

llvm-svn: 257025
2016-01-07 04:15:31 +00:00
Haicheng Wu 08b9462540 [AArch64 MachineCombine] Enhance/Add support for general reassociation to reduce the critical path
Allow fadd/fmul to be reassociated in aarch64.

llvm-svn: 257024
2016-01-07 04:01:02 +00:00
Tom Stellard 80efb16aad [ELF] Add AMDGPU support
Summary: This will allow us to remove the AMDGPU support from old ELF.

Reviewers: rafael, ruiu

Differential Revision: http://reviews.llvm.org/D15895

llvm-svn: 257023
2016-01-07 03:59:08 +00:00
Philip Reames 3e2cf5320c [Statepoints] Initial support for relocating vectors of pointers
Currently, we try to split vectors of pointers back into their component pointer elements during rewrite-statepoints-for-gc. This is less than ideal since presumably the vectorizer chose to vectorize for a reason. :) It's also been a source of bugs - in particular, the relocation logic as currently implemented was recently discovered to be wrong.

The alternate approach is to allow gc.relocates of vector-of-pointer type and update the backend to handle them. That's what this patch tries to do. This won't actually enable vector-of-pointers in practice - there are some RS4GC changes needed - but the lowering is standalone and testable so it makes sense to separate.

Note that there are some known cases around vector constants which this patch does not handle. Once this is in, I'll send another patch with individual fixes and test cases. 

Differential Revision: http://reviews.llvm.org/D15632

llvm-svn: 257022
2016-01-07 03:32:11 +00:00
Dan Gohman 0d75f2c538 [WebAssembly] Add -m:e to the target triple.
llvm-svn: 257021
2016-01-07 03:20:15 +00:00