Commit Graph

227813 Commits

Author SHA1 Message Date
Renato Golin 5cb666add7 [ARM] Adding IEEE-754 SIMD detection to loop vectorizer
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.

This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).

For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.

This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).

As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.

Fixes PR16275.

llvm-svn: 266363
2016-04-14 20:42:18 +00:00
Sanjay Patel e998b91d86 [InstCombine] remove constant by inverting compare + logic (PR27105)
https://llvm.org/bugs/show_bug.cgi?id=27105

We can check if all bits outside of a constant mask are set with a 
single constant.

As noted in the bug report, although this form should be considered the
canonical IR, backends may want to transform this into an 'andn' / 'andc' 
comparison against zero because that could be a single machine instruction.

Differential Revision: http://reviews.llvm.org/D18842

llvm-svn: 266362
2016-04-14 20:17:40 +00:00
Greg Clayton dfa63248c7 Fix Xcode project after recent s390x changes.
llvm-svn: 266361
2016-04-14 20:05:21 +00:00
Dehao Chen 34cc676732 Fix null pointer access for discriminator assignment.
Summary: This fixes the buildbot failure.

Reviewers: dnovillo, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19129

llvm-svn: 266360
2016-04-14 19:46:38 +00:00
Richard Smith f32cc293fa Make this code less brittle. The benefits of a fixed-size array aren't worth the maintenance cost.
llvm-svn: 266359
2016-04-14 19:45:19 +00:00
Aaron Ballman b602ee7d8f Add support for type aliases to modernize-redundant-void-arg.cpp
Patch by Clement Courbet.

llvm-svn: 266358
2016-04-14 19:28:13 +00:00
Rafael Espindola c9157d35d9 Hash symbol names only once per global SymbolBody.
The DenseMap doesn't store hash results. This means that when it is
resized it has to recompute them.

This patch is a small hack that wraps the StringRef in a struct that
remembers the hash value. That way we can be sure it is only hashed
once.

llvm-svn: 266357
2016-04-14 19:17:16 +00:00
Tom Stellard 000c5af3e6 AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

llvm-svn: 266356
2016-04-14 19:09:28 +00:00
Rafael Espindola 3151d89595 Simplify handling of size relocations. NFC.
llvm-svn: 266355
2016-04-14 18:39:44 +00:00
Dehao Chen 46f8fbbb1b Update discriminator assignment algorithm to handle nested call correctly.
Summary: Add discriminator for nested call correctly.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19127

llvm-svn: 266354
2016-04-14 18:37:18 +00:00
Richard Smith 9c4fb0a833 Fix off-by-one error in worst-case number of offsets needed for an AST record.
llvm-svn: 266353
2016-04-14 18:32:54 +00:00
Ulrich Weigand acc50e0a99 Fix regression in gnu_libstdcpp.py introduced by r266313
CreateChildAtOffset needs a byte offset, not an element number.

llvm-svn: 266352
2016-04-14 18:31:12 +00:00
Reid Kleckner 28865809fe Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.h
MachineInstr.h and MachineInstrBuilder.h are very popular headers,
widely included across all LLVM backends. It turns out that there only a
handful of TUs that actually care about DI operands on MachineInstrs.

After this change, touching DebugInfoMetadata.h and rebuilding llc only
needs 112 actions instead of 542.

llvm-svn: 266351
2016-04-14 18:29:59 +00:00
Davide Italiano 96d2a1c603 [ValueMapper] Range-loopify to improve readability. NFC.
llvm-svn: 266350
2016-04-14 18:07:32 +00:00
Jacques Pienaar ad1db3597e [lanai] Add custom lowering for SRL_PARTS i32.
llvm-svn: 266349
2016-04-14 17:59:22 +00:00
Tom Stellard cef0fe4245 [GlobalISel] Move GISelAccessor class into public headers
Reviewers: qcolombet

Subscribers: joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19120

llvm-svn: 266348
2016-04-14 17:45:38 +00:00
Nicolai Haehnle 13d90f324c [DivergenceAnalysis] Treat PHI with incoming undef as constant
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.

This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.

This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex

Reviewers: arsenm, tstellarAMD, jingyue

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19013

llvm-svn: 266347
2016-04-14 17:42:47 +00:00
Nicolai Haehnle 05b127da06 [StructurizeCFG] Annotate branches that were treated as uniform
Summary:
This fully solves the problem where the StructurizeCFG pass does not
consider the same branches as uniform as the SIAnnotateControlFlow pass.
The patch in D19013 helps with this problem, but is not sufficient
(and, interestingly, causes a "regression" with one of the existing
test cases).

No tests included here, because tests in D19013 already cover this.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19018

llvm-svn: 266346
2016-04-14 17:42:35 +00:00
Nicolai Haehnle 723b73b4eb AMDGPU: Remove SIFixSGPRLiveRanges pass
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like

  def %vreg0:SGPR_32
  ...
if-block:
  ..
  def %vreg1:SGPR_32
  ...
else-block:
  ...
  use %vreg0:SGPR_32
  ...

and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.

However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.

In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.

Reviewers: arsenm, tstellarAMD

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19041

llvm-svn: 266345
2016-04-14 17:42:29 +00:00
Nicolai Haehnle 19f0f5177d AMDGPU: change a redundant if () to an assert(). NFC
Summary:
I've been carrying this change around with me for a while, because the if ()
managed to confuse me while following the code. All callers ensure that the
assertion holds.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19042

llvm-svn: 266344
2016-04-14 17:42:18 +00:00
Ulrich Weigand 4f42310dfb Disable LinuxCoreTestCase.test_s390x
This seems to hang on non-s390x hosts.  Disable for now to get the build
bots going again.

llvm-svn: 266343
2016-04-14 17:36:41 +00:00
Tom Stellard b72a65ff53 [GlobalISel] Coding style and whitespace fixes
Reviewers: qcolombet

Subscribers: joker.eph, llvm-commits, vkalintiris

Differential Revision: http://reviews.llvm.org/D19119

llvm-svn: 266342
2016-04-14 17:23:33 +00:00
Ulrich Weigand da70c17bfc Revert r266311 - Fix usage of APInt.getRawData for big-endian systems
Try to get 32-bit build bots running again.

llvm-svn: 266341
2016-04-14 17:22:18 +00:00
George Rimar 5cfd306e00 Move variables closer to code scopes that uses them. NFC.
llvm-svn: 266340
2016-04-14 17:05:56 +00:00
Tim Northover cdf1529c01 AArch64: expand cmpxchg after regalloc at -O0.
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.

I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.

It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.

Should fix PR25526.

llvm-svn: 266339
2016-04-14 17:03:29 +00:00
Jacques Pienaar add4a274ba [lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and getMemOpBaseRegImmOfsWidth.
Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched.

Reviewers: eliben, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18903

llvm-svn: 266338
2016-04-14 16:47:42 +00:00
Tom Stellard 79a1fd718c AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard f110f8f9f7 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Betul Buyukkurt 4f1e8c94bf [PGO] Do not attach VP metadata if value count at site is 0 [NFC]
llvm-svn: 266335
2016-04-14 16:25:45 +00:00
Silviu Baranga b77365b595 [SCEV][LAA] Add tests for SCEV expression transformations performed during LAA
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.

Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.

The additional checking also acts as white-box testing for the getAsAddRec method.

Reviewers: anemet, sanjoy

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18792

llvm-svn: 266334
2016-04-14 16:08:45 +00:00
Etienne Bergeron 47205aa773 [clang-tidy] Fix documentation generation.
Summary: The patch is fixing the generation of clang-tidy documentation.

Reviewers: alexfh

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D19121

llvm-svn: 266333
2016-04-14 16:08:04 +00:00
Jonathan Peyton 99ef4d0433 [ITTNOTIFY] Correct barrier imbalance time in case of tasks
ittnotify fix for barrier imbalance time in case tasks exist. In the current
implementation, task execution time is included into aggregated time on a
barrier. This fix calculates task execution time and corrects the arrive time
by subtracting the task execution time.

Since __kmp_invoke_task() can not only be called on a barrier, the field
th.th_bar_arrive_time is used to check if the function was called at the
barrier (th.th_bar_arrive_time != 0). So for this check, th_bar_arrive_time
is set to zero right after the value is used on the barrier.

Differential Revision: http://reviews.llvm.org/D19030

llvm-svn: 266332
2016-04-14 16:06:49 +00:00
Aaron Ballman 66eb58a756 Add typedefNameDecl() and typeAliasDecl() to the AST matchers; improves hasType() to match on TypedefNameDecl nodes.
Patch by Clement Courbet.

llvm-svn: 266331
2016-04-14 16:05:45 +00:00
Rafael Espindola e149b488af Remove the only case where we would relocate a R_386_TLS_TPOFF.
llvm-svn: 266330
2016-04-14 16:05:42 +00:00
Jonathan Peyton 377aa40d84 Exponential back off logic for test-and-set lock
This change adds back off logic in the test and set lock for better contended
lock performance. It uses a simple truncated binary exponential back off
function. The default back off parameters are tuned for x86.

The main back off logic has a two loop structure where each is controlled by a
user-level parameter:
max_backoff - limits the outer loop number of iterations.
    This parameter should be a power of 2.
min_ticks - the inner spin wait loop number of "ticks" which is system
    dependent and should be tuned for your system if you so choose.
    The "ticks" on x86 correspond to the time stamp counter,
    but on other architectures ticks is a timestamp derived
    from gettimeofday().

The user can modify these via the environment variable:
KMP_SPIN_BACKOFF_PARAMS=max_backoff[,min_ticks]
Currently, since the default user lock is a queuing lock,
one would have to also specify KMP_LOCK_KIND=tas to use the test-and-set locks.

Differential Revision: http://reviews.llvm.org/D19020

llvm-svn: 266329
2016-04-14 16:00:37 +00:00
Rafael Espindola c1c14a227e Merge duplicated cases. NFC.
llvm-svn: 266328
2016-04-14 15:56:14 +00:00
Pavel Labath 7e49e3d97c [test] make expect_state_changes actually expect *only* them
The android dirty stderr problem has uncovered an issue where lldbutil.expect_state_changes was
reading events other than state change events, which resulted in general confusion. Make it more
strict to accept *only* state changes.

llvm-svn: 266327
2016-04-14 15:52:58 +00:00
Pavel Labath e6961d0306 [test] Relax stderr expectations on targets with chatty output
Summary:
On some android targets, a binary can produce additional garbage (e.g. warning messages from the
dynamic linker) on the standard error, which confuses some tests. This relaxes the stderr
expectations for targets known for their chattyness.

Reviewers: tfiala, ovyalov

Subscribers: tberghammer, danalbert, srhines, lldb-commits

Differential Revision: http://reviews.llvm.org/D19114

llvm-svn: 266326
2016-04-14 15:52:53 +00:00
Ismail Donmez f677cd539f Fix testcase for the LLVM_LIBDIR_SUFFIX=64 case. Fallout from r266108.
llvm-svn: 266324
2016-04-14 15:32:24 +00:00
Michael Kruse 5c7d0cb834 Add contexts to test cases. NFC.
As discussed in the Polly weekly phone call and reviews.llvm.org/D18878,
the assumed contexts changed (widen) due to D18878/r265942. Also check
these contexts in the tests affected by that change.

llvm-svn: 266323
2016-04-14 15:22:13 +00:00
Michael Kruse b931d4c387 Add InvalidContext to update_test.py.
This allows the test update script to add 'Invalid Context:' to test
cases. Enable with --check-include=InvalidContext.

llvm-svn: 266322
2016-04-14 15:22:04 +00:00
Marianne Mailhot-Sarrasin 03137c6538 clang-format: Last line in incomplete block is indented incorrectly
Indentation of the last line was reset to the initial indentation of the block when reaching EOF.

Patch by Maxime Beaulieu

Differential Revision: http://reviews.llvm.org/D19065

llvm-svn: 266321
2016-04-14 14:56:49 +00:00
Marianne Mailhot-Sarrasin 51fe279fbb clang-format: Implemented tab usage for continuation and indentation
Use tabs to fill whitespace at the start of a line.

Patch by Maxime Beaulieu

Differential Revision: http://reviews.llvm.org/D19028

llvm-svn: 266320
2016-04-14 14:52:26 +00:00
Marianne Mailhot-Sarrasin 4988fa1bc5 clang-format: Allow include of clangFormat.h in managed context
Including VirtualFileSystem.h in the clangFormat.h indirectly includes <atomic>.
This header is blocked when compiling with /clr.

Patch by Maxime Beaulieu

Differential Revision: http://reviews.llvm.org/D19064

llvm-svn: 266319
2016-04-14 14:47:37 +00:00
Rafael Espindola 735bbaa1d9 Add missing typename.
llvm-svn: 266318
2016-04-14 14:40:38 +00:00
George Rimar 8bbff7ec85 [ELF] - Refactoring of end/edata/etext implementation.
Minor refactoring of how end/edata/etext symbols are handled.

Differential revision: http://reviews.llvm.org/D19109

llvm-svn: 266317
2016-04-14 14:37:59 +00:00
Ulrich Weigand bd5262629d Find .plt section in object files generated by recent ld
Code in ObjectFileELF::ParseTrampolineSymbols assumes that the sh_info
field of the .rel(a).plt section identifies the .plt section.

However, with recent GNU ld this is no longer true.  As a result of this:
https://sourceware.org/bugzilla/show_bug.cgi?id=18169
in object files generated with current linkers the sh_info field of
.rel(a).plt now points to the .got.plt section (or .got on some targets).

This causes LLDB to fail to identify any PLT stubs, causing a number of
test case failures.

This patch changes LLDB to simply always look for the .plt section by
name.  This should be safe across all linkers and targets.

Differential Revision: http://reviews.llvm.org/D18973

llvm-svn: 266316
2016-04-14 14:36:29 +00:00
Ulrich Weigand 7e8de59b90 Fix test cases for big-endian systems
A number of test cases were failing on big-endian systems simply due to
byte order assumptions in the tests themselves, and no underlying bug
in LLDB.

These two test cases:
  tools/lldb-server/lldbgdbserverutils.py
  python_api/process/TestProcessAPI.py
actually check for big-endian target byte order, but contain Python errors
in the corresponding code paths.

These test cases:
  functionalities/data-formatter/data-formatter-python-synth/TestDataFormatterPythonSynth.py
  functionalities/data-formatter/data-formatter-smart-array/TestDataFormatterSmartArray.py
  functionalities/data-formatter/synthcapping/TestSyntheticCapping.py
  lang/cpp/frame-var-anon-unions/TestFrameVariableAnonymousUnions.py
  python_api/sbdata/TestSBData.py  (first change)
could be fixed to check for big-endian target byte order and update the
expected result strings accordingly.  For the two synthetic tests, I've
also updated the source to make sure the fake_a value is always nonzero
on both big- and little-endian platforms.

These test case:
  python_api/sbdata/TestSBData.py  (second change)
  functionalities/memory/cache/TestMemoryCache.py
simply accessed memory with the wrong size, which wasn't noticed on LE
but fails on BE.

Differential Revision: http://reviews.llvm.org/D18985

llvm-svn: 266315
2016-04-14 14:35:02 +00:00
Ulrich Weigand 91a2ad182d Fix ARM instruction emulation tests on big-endian systems
Running the ARM instruction emulation test on a big-endian system
would fail, since the code doesn't respect endianness properly.

In EmulateInstructionARM::TestEmulation, code assumes that an
instruction opcode read in from the test file is in target byte
order, but it was in fact read in in host byte order.

More difficult to fix, the EmulationStateARM structure models
the overlapping sregs and dregs by a union in _sd_regs.  This
only works correctly if the host is a little-endian system.
I've removed the union in favor of a simple array containing
the 32 sregs, and changed any code accessing dregs to explicitly
use the correct two sregs overlaying that dreg in the proper
target order.

Also, the EmulationStateARM::ReadPseudoMemory and WritePseudoMemory
track memory as a map of uint32_t values in host byte order, and
implement 64-bit memory accessing by splitting them up into two
uint32_t ones.  However, callers expect memory contents to be
provided in the form of a byte array (in target byte order).
This means the uint32_t contents need to be byte-swapped on
BE systems, and when splitting up a 64-bit access into two 32-bit
ones, byte order has to be respected.

Differential Revision: http://reviews.llvm.org/D18984

llvm-svn: 266314
2016-04-14 14:34:19 +00:00
Ulrich Weigand 0501eebda6 Miscellaneous fixes for big-endian systems
This patch fixes a bunch of issues that show up on big-endian systems:

- The gnu_libstdcpp.py script doesn't follow the way libstdc++ encodes
  bit vectors: it should identify the enclosing *word* and then access
  the appropriate bit within that word.  Instead, the script simply
  operates on bytes.  This gives the same result on little-endian
  systems, but not on big-endian.

- lldb_private::formatters::WCharSummaryProvider always assumes wchar_t
  is UTF16, even though it could also be UTF8 or UTF32.  This is mostly
  not an issue on little-endian systems, but immediately fails on BE.
  Fixed by checking the size of wchar_t like WCharStringSummaryProvider
  already does.

- ClangASTContext::GetChildCompilerTypeAtIndex uses uint32_t to access
  the virtual base offset stored in the vtable, even though the size
  of this field matches the target pointer size according to the C++
  ABI.  Again, this is mostly not visible on LE, but fails on BE.

- Process::ReadStringFromMemory uses strncmp to search for a terminator
  consisting of multiple zero bytes.  This doesn't work since strncmp
  will stop already at the first zero byte.  Use memcmp instead.

Differential Revision: http://reviews.llvm.org/D18983

llvm-svn: 266313
2016-04-14 14:33:47 +00:00