Commit Graph

253121 Commits

Author SHA1 Message Date
Matthew Simpson 3650df13be [ARM/AArch64] Relocate and update InterleavedAccessPass tests (NFC)
The interleaved access pass is an IR-to-IR transformation that runs before code
generation. It matches interleaved memory operations to target-specific
intrinsics (that are later lowered to load and store multiple instructions on
ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under
test/Transforms. This patch moves the InterleavedAccessPass tests out of
test/CodeGen and into target-specific directories under
test/Transforms/InterleavedAccess.

Although the pass is an IR pass, many of the existing tests were llc tests
rather opt tests. For example, the tests would check for ldN/stN instructions
generated by llc rather than the intrinsic calls the pass actually inserts.
Thus, this patch updates all tests to be opt tests that check for the inserted
intrinsics. We already have separate CodeGen tests that ensure we lower the
interleaved access intrinsics to their corresponding ldN/stN instructions. In
addition to migrating the tests to opt, this patch also performs some minor
clean-up (to ensure consistent naming, etc.).

Differential Revision: https://reviews.llvm.org/D29184

llvm-svn: 293309
2017-01-27 17:33:16 +00:00
Matt Arsenault 32b9600a7e NVPTX: Make NVPTXInferAddressSpaces preserve CFG
llvm-svn: 293308
2017-01-27 17:30:39 +00:00
Jun Bum Lim b99a06b7c9 [CodeGenPrep]No negative cost in the ExtLd promotion
Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument.

Reviewers: mcrosier, jmolloy, qcolombet, javed.absar

Reviewed By: mcrosier, qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28871

llvm-svn: 293307
2017-01-27 17:16:37 +00:00
Hans Wennborg 091f1b6ef3 clang-cl: Warn about /U flags that look like filenames (PR31662)
Both on Mac and Windows, it's common to have a 'Users' directory in the
root of the filesystem, so one might specify a filename as
'/Users/me/myfile.c'. clang-cl (as well as MSVC's cl.exe) will interpret
that as invoking the '/U' option, which is probably not what the user
wanted. Add a warning about this.

Differential Revision: https://reviews.llvm.org/D29198

llvm-svn: 293305
2017-01-27 17:09:41 +00:00
Michael Kruse 33dc454700 [CodePrepa] Remove unused declaration. NFC.
llvm-svn: 293304
2017-01-27 16:59:09 +00:00
Stanislav Mekhanoshin f6c1feb8c3 [AMDGPU] Turn AMDGPUUnifyMetadata back into module pass
With the adjustPassManager interface that is now possible to use
custom early module passes.

Differential Revision: https://reviews.llvm.org/D29189

llvm-svn: 293300
2017-01-27 16:38:10 +00:00
Mehdi Amini 1726fc698c Fix BasicAA incorrect assumption on GEP
This is fixing pr31761: BasicAA is deducing NoAlias
on the result of the GEP if the base pointer is itself NoAlias.

This is possible only if the NoAlias on the base pointer is
deduced with a non-sized query: this should guarantee that
the pointers are belonging to different memory allocation
and that the GEP can't legally jump from one to another.

Differential Revision: https://reviews.llvm.org/D29216

llvm-svn: 293293
2017-01-27 16:12:22 +00:00
Ivan Krasin c05c9db364 Avoid using unspecified ordering in MetadataLoader::MetadataLoaderImpl::parseOneMetadata.
Summary:
MetadataLoader::MetadataLoaderImpl::parseOneMetadata uses
the following construct in a number of places:

```
MetadataList.assignValue(<...>, NextMetadataNo++);
```

There, NextMetadataNo gets incremented, and since the order
of arguments evaluation is not specified, that can happen
before or after other arguments are evaluated.

In a few cases the other arguments indirectly use NextMetadataNo.
For instance, it's

```
MetadataList.assignValue(
    GET_OR_DISTINCT(DIModule,
                    (Context, getMDOrNull(Record[1]),
                     getMDString(Record[2]), getMDString(Record[3]),
                     getMDString(Record[4]), getMDString(Record[5]))),
    NextMetadataNo++);
```

getMDOrNull calls getMD that uses NextMetadataNo:

```
MetadataList.getMetadataFwdRef(NextMetadataNo);
```

Therefore, the order of evaluation becomes important. That caused
a very subtle LLD crash that only happens if compiled with GCC or
if LLD is built with LTO. In the case if LLD is compiled with Clang
and regular linking mode, everything worked as intended.

This change extracts incrementing of NextMetadataNo outside of
the arguments list to guarantee the correct order of evaluation.

For the record, this has taken 3 days to track to the origin. It all
started with a ThinLTO bot in Chrome not being able to link a target
if debug info is enabled.

Reviewers: pcc, mehdi_amini

Reviewed By: mehdi_amini

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D29204

llvm-svn: 293291
2017-01-27 15:54:49 +00:00
Rafael Espindola 403b093eff Fix and simplify the reporting of undefined symbols.
Now reportUndefined only has to look at Config->UnresolvedSymbols and
the symbol. getUnresolvedSymbolOption does all the hard work of
mapping options like -shared and -z defs to one of the
UnresolvedPolicy enum entries.

The critical fix is that now "-z defs --warn-unresolved-symbols" only
warns.

llvm-svn: 293290
2017-01-27 15:52:08 +00:00
Pavel Labath 095d6b8f99 Address post-commit review remarks
Tamas pointed out that the macro name I used in r293282 was too vague.
Rename it to better reflect what it is used for.

llvm-svn: 293287
2017-01-27 15:19:03 +00:00
Anastasia Stulova af0a7bbbe2 [OpenCL] Add missing address spaces in IR generation of blocks
Modify ObjC blocks impl wrt address spaces as follows:

- keep default private address space for blocks generated
as local variables (with captures);

- add global address space for global block literals (no captures);

- make the block invoke function and enqueue_kernel prototype with
the generic AS block pointer parameter to accommodate both 
private and global AS cases from above;

- add block handling into default AS because it's implemented as
a special pointer type (BlockPointer) in the frontend and therefore
it is used as a pointer everywhere. This is also needed to accommodate
both private and global AS blocks for the two cases above.

- removes ObjC RT specific symbols (NSConcreteStackBlock and
NSConcreteGlobalBlock) in the OpenCL mode.

Review: https://reviews.llvm.org/D28814
llvm-svn: 293286
2017-01-27 15:11:34 +00:00
Simon Dardis 9e962add70 [mips] Add support for static model on N64
The patch teaches the Clang driver how to handle the N64 static
relocation model properly. It enforces the correct target feature
(+noabicalls) when -fno-pic is used. This is required as non-pic
N64 code as the abi extension to call PIC code (CPIC) is unsupported.

Make PIC the default for mips64 and mips64el, this affects both N32
& N64 ABIs, to better match GCC.

As part of this effort, clean up the assembler invocation command
builder, so the correct flags are used.

This and r293279 in LLVM resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

Reviewers: slthakur, seanbruno

Differential Revision: https://reviews.llvm.org/D29031

llvm-svn: 293285
2017-01-27 15:05:25 +00:00
Peter Smith 5191c6f945 [ELF][ARM] Use SyntheticSections for Thunks
Thunks are now implemented by redirecting the relocation to the
symbol S, to a symbol TS in a Thunk. The Thunk will transfer control
to S. This has the following implications:
- All the side-effects of Thunks happen within createThunks()
- Thunks are no longer stored in InputSections and Symbols no longer
  need to hold a pointer to a Thunk
- The synthetic Thunk sections need to be merged into OutputSections
    
This implementation is almost a direct conversion of the existing
Thunks with the following exceptions:
- Mips LA25 Thunks are placed before the InputSection that defines
  the symbol that needs a Thunk.
- All ARM Thunks are placed at the end of the OutputSection of the
  first caller to the Thunk.
    
Range extension Thunks are not supported yet so it is optimistically
assumed that all Thunks can be reused.

Differential Revision:  https://reviews.llvm.org/D29129

llvm-svn: 293283
2017-01-27 13:10:16 +00:00
Pavel Labath 810055ee23 Fix android-i386 build broken by previous commit
I foolishly thought I could simplify the condition to cover all android
targets. I was wrong - i386 headers don't define __NR_accept.

Revert back to enabling the workaround on arm an mips only.

llvm-svn: 293282
2017-01-27 12:58:23 +00:00
Pavel Labath ef97630fd2 Refactor the android accept hack
This moves the accept hack from the android toolchain file into
LLDBConfig.cmake. This allows successful lldb android compilation
without relying on our custom toolchain file.

llvm-svn: 293281
2017-01-27 12:23:51 +00:00
Artem Dergachev 12caf8e1e6 [analyzer] Consider function call arguments while building CallGraph.
Function call can appear in the arguments of another function call, eg.:

  foo(bar());

This patch adds support for such cases.

Patch by Ivan Sidorenko!

Differential revision: https://reviews.llvm.org/D28905

llvm-svn: 293280
2017-01-27 12:14:56 +00:00
Simon Dardis ca74dd79e9 [mips] Recommit: "N64 static relocation model support"
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

This version corrects a "Conditional jump or move depends on uninitialised
value(s)" error detected by valgrind present in the original commit.

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293279
2017-01-27 11:36:52 +00:00
Eugene Leviant bcff495b85 [ELF] Fixed formatting. NFC
llvm-svn: 293278
2017-01-27 11:06:23 +00:00
Jonas Hahnfeld cfe5ef589a [libomptarget] Fix compilation with libc++
iterator is only guaranteed to be default-constructible, without any argument.

Differential Revision: https://reviews.llvm.org/D29171

llvm-svn: 293277
2017-01-27 11:03:33 +00:00
Eugene Leviant 8b7cadcf96 [ELF] Bypass section type check
Differential revision: https://reviews.llvm.org/D28761

llvm-svn: 293276
2017-01-27 11:01:43 +00:00
Simon Dardis 73190d3906 [lld][mips] Correct tests for mips64 implying PIC.
Currently LLVM can only generate PIC code for MIPS64 with the N64 as
it uses the idiom "isPositionIndependent() || IsABI_N64()" throughout the
MIPS backend. r293164 changed this, causing test failures for LLD.

This patch changes the tests minimally to preserve existing test coverage
and one case where the test was "right" in the wrong circumstance.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D29194

llvm-svn: 293275
2017-01-27 11:01:10 +00:00
Alexey Bataev 4015bf8372 [SLP] Refactoring of horizontal reduction analysis, NFC.
Some checks in SLP horizontal reduction analysis function are performed
several times, though it is enough to perform these checks only once
during an initial attempt at adding candidate for the reduction
instruction/reduced value.

Differential Revision: https://reviews.llvm.org/D29175

llvm-svn: 293274
2017-01-27 10:54:04 +00:00
Chandler Carruth fd2d7c72fc [LICM] When we are recomputing the alias sets for a subloop, we cannot
skip sub-subloops.

The logic to skip subloops dated from when this code was shared with the
cached case. Once it was factored out to only run in the case of
recomputed subloops it became a dangerous bug. If a subsubloop contained
an interfering instruction it would be silently skipped from the alias
sets for LICM.

With the old pass manager this was extremely hard to trigger as it would
require failing to visit these subloops with the LICM pass but then
visiting the outer loop somehow. I've not yet contrived any test case
that actually manages to trigger this.

But with the new pass manager we don't do the cross-loop caching hack
that the old PM does and so we recompute alias set information from
first principles. While this seems much cleaner and simpler it exposed
this bug and would subtly miscompile code due to failing to correctly
model the aliasing constraints of deeply nested loops.

llvm-svn: 293273
2017-01-27 10:27:32 +00:00
Martin Probst fa37b18f94 clang-format: [JS] do not format MPEG transport streams.
Summary:
The MPEG transport stream file format also uses ".ts" as its file extension.
This change detects its specific framing format (0x47 every 189 bytes) and
simply ignores MPEG TS files.

Reviewers: djasper, sammccall

Subscribers: klimek, cfe-commits

Differential Revision: https://reviews.llvm.org/D29186

llvm-svn: 293270
2017-01-27 09:09:11 +00:00
Boris Ulasevich 67346ca9ef Unroll r292930 due to TestCallThatThrows test fail is not fixed in reasonable time.
llvm-svn: 293269
2017-01-27 07:51:43 +00:00
Jonas Paulsson bb0ed3e732 [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert().
In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
partial number of available vector elements), WidenVecRes_Convert() used to
resort to scalarization.

This patch adds a handling of the (common) case where an input vector can be
found of same width as the widened result vector, by converting the node to
SIGN/ZERO_EXTEND_VECTOR_INREG.

Review: Eli Friedman
llvm-svn: 293268
2017-01-27 07:46:26 +00:00
Diana Picus 9141501721 Revert "Implement a new clang-tidy check that suggests users replace dynamic exception specifications with noexcept exception specifications."
This reverts commit r293217, its follow-up 293218 and part of 293234 because it
broke all bots that build clang-tools-extra.

llvm-svn: 293267
2017-01-27 07:19:22 +00:00
Adam Nemet 572fca7111 [opt-viewer] Introduce global context
This is necessary since globals (max_hotness, caller_loc) need to be
explicitly passed to the subprocesses.

llvm-svn: 293266
2017-01-27 06:39:09 +00:00
Adam Nemet 07f1264b0b [opt-viewer] Remove message from the key
This is causing problems because the rendering of the text will depend on
varying global state to show relative hotness or a link in the inlining
context.

llvm-svn: 293265
2017-01-27 06:39:08 +00:00
Adam Nemet 41cf9b271c [opt-viewer] Unique across the different jobs as well
llvm-svn: 293264
2017-01-27 06:39:06 +00:00
Adam Nemet 4f075e3c3e [opt-viewer] Make sorting for the index page deterministic
Break the tie between entries with identical hotness deterministically.

llvm-svn: 293263
2017-01-27 06:39:02 +00:00
Adam Nemet 742615e5a9 [opt-viewer] Include the function in the remark key
Avoid uniquing remarks with different the inlining context (Function).

llvm-svn: 293262
2017-01-27 06:39:01 +00:00
Adam Nemet 55bfb497d2 [opt-viewer] Put critical items in parallel
Summary:
Put opt-viewer critical items in parallel

Patch by Brian Cain!

Requires features from Python 2.7

**Performance**
Below are performance results across various configurations. These were taken on an i5-5200U (dual core + HT). They were taken with a small subset of the YAML output of building Python 3.6.0b3 with LTO+PGO. 60 YAML files.

"multiprocessing" is the current submission contents. "baseline" is as of 544f14c6b2a07a94168df31833dba9dc35fd8289 (I think this is aka r287505).

"ImportError" vs "class<...CLoader>" below are just confirming the expected configuration (with/without CLoader).

The below was measured on AMD A8-5500B (4 cores) with 224 input YAML files, showing a ~1.75x speed increase over the baseline with libYAML.  I suspect it would scale well on high-end servers.

```
**************************************** MULTIPROCESSING ****************************************
PyYAML:
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
        ImportError: cannot import name CLoader
        Python 2.7.10
489.42user 5.53system 2:38.03elapsed 313%CPU (0avgtext+0avgdata 400308maxresident)k
0inputs+31392outputs (0major+473540minor)pagefaults 0swaps

PyYAML+libYAML:
        <class 'yaml.cyaml.CLoader'>
        Python 2.7.10
78.69user 5.45system 0:32.63elapsed 257%CPU (0avgtext+0avgdata 398560maxresident)k
0inputs+31392outputs (0major+542022minor)pagefaults 0swaps

PyPy/PyYAML:
        Traceback (most recent call last):
          File "<builtin>/app_main.py", line 75, in run_toplevel
          File "<builtin>/app_main.py", line 601, in run_it
          File "<string>", line 1, in <module>
        ImportError: cannot import name 'CLoader'
        Python 2.7.9 (2.6.0+dfsg-3, Jul 04 2015, 05:43:17)
        [PyPy 2.6.0 with GCC 4.9.3]
154.27user 8.12system 0:53.83elapsed 301%CPU (0avgtext+0avgdata 627960maxresident)k
808inputs+30376outputs (0major+727994minor)pagefaults 0swaps
**************************************** BASELINE        ****************************************
PyYAML:
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
        ImportError: cannot import name CLoader
        Python 2.7.10
        358.08user 4.05system 6:08.37elapsed 98%CPU (0avgtext+0avgdata 315004maxresident)k
0inputs+31392outputs (0major+85252minor)pagefaults 0swaps

PyYAML+libYAML:
        <class 'yaml.cyaml.CLoader'>
        Python 2.7.10
50.32user 3.30system 0:56.59elapsed 94%CPU (0avgtext+0avgdata 307296maxresident)k
0inputs+31392outputs (0major+79335minor)pagefaults 0swaps

PyPy/PyYAML:
        Traceback (most recent call last):
          File "<builtin>/app_main.py", line 75, in run_toplevel
          File "<builtin>/app_main.py", line 601, in run_it
          File "<string>", line 1, in <module>
        ImportError: cannot import name 'CLoader'
        Python 2.7.9 (2.6.0+dfsg-3, Jul 04 2015, 05:43:17)
        [PyPy 2.6.0 with GCC 4.9.3]
72.94user 5.18system 1:23.41elapsed 93%CPU (0avgtext+0avgdata 455312maxresident)k
0inputs+30392outputs (0major+110280minor)pagefaults 0swaps

```

Reviewers: fhahn, anemet

Reviewed By: anemet

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D26967

llvm-svn: 293261
2017-01-27 06:38:31 +00:00
Richard Trieu 0b79aa3373 Fix unused variable warning.
llvm-svn: 293260
2017-01-27 06:06:05 +00:00
Saleem Abdulrasool 26c00e3700 ARM: fix vectorized division on WoA
The Windows on ARM target uses custom division for normal division as
the backend needs to insert division-by-zero checks.  However, it is
designed to only handle non-vectorized division.  ARM has custom
lowering for vectorized division as that can avoid loading registers
with the values and invoke a division routine for each one, preferring
to lower using NEON instructions.  Fall back to the custom lowering for
the NEON instructions if we encounter a vectorized division.

Resolves PR31778!

llvm-svn: 293259
2017-01-27 03:41:53 +00:00
Daniel Berlin c479686af2 NewGVN: Add basic dead and redundant store elimination
Summary:
This adds basic dead and redundant store elimination to
NewGVN.  Unlike our current DSE, it will happily do cross-block DSE if
it meets our requirements.

We get a bunch of DSE's simple.ll cases, and some stuff it doesn't.
Unlike DSE, however, we only try to eliminate stores of the same value
to the same memory location, not just general stores to the same
memory location.

Reviewers: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29149

llvm-svn: 293258
2017-01-27 02:37:11 +00:00
Saleem Abdulrasool 2b2f4dadc4 Revert "DWARF: convert error logs to _LIBUNWIND_LOG"
This reverts SVN r292721.  Avoid the use of the GNU extension as the
preprocessor in C++11 mode requires at least one argument, and this
warning cannot be disabled, resulting in failing -Werror builds.

llvm-svn: 293257
2017-01-27 02:26:52 +00:00
NAKAMURA Takumi 0d299191d0 NVPTXCodeGen: Add IPO to libdeps, since r293189.
llvm-svn: 293256
2017-01-27 02:11:10 +00:00
Tim Shen 601ba8c583 [APFloat] Reduce some dispatch boilerplates. NFC.
Summary: This is an attempt to reduce the verbose manual dispatching code in APFloat. This doesn't handle multiple dispatch on single discriminator (e.g. APFloat::add(const APFloat&)), nor handles multiple dispatch on multiple discriminators (e.g. APFloat::convert()).

Reviewers: hfinkel, echristo, jlebar

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D29161

llvm-svn: 293255
2017-01-27 02:11:07 +00:00
Richard Smith c5b2e00d06 [docs] Add help text and refine grouping for various options.
Also accept -G= (and -msmall-data-threshold=) as an alias for -G on MIPS as well as Hexagon.

llvm-svn: 293254
2017-01-27 02:08:37 +00:00
Justin Lebar 25ebe2d767 [NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC.
llvm-svn: 293253
2017-01-27 02:04:07 +00:00
Richard Smith b2c82a6970 Improve workaround for Sphinx's lack of support for command line options containing '+', '.' etc. to be more stable as the set of options changes.
llvm-svn: 293252
2017-01-27 01:54:42 +00:00
Justin Lebar e3ac0fb948 [NVPTX] Fix use-after-stack-free bug in InstCombineCalls.
Introduced in r293244.

llvm-svn: 293251
2017-01-27 01:49:39 +00:00
Xin Tong e5f8d643d4 Constant fold switch inst when looking for trivial conditions to unswitch on.
Summary: Constant fold switch inst when looking for trivial conditions to unswitch on.

Reviewers: sanjoy, chenli, hfinkel, efriedma

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29037

llvm-svn: 293250
2017-01-27 01:42:20 +00:00
Chandler Carruth baabda9317 [PM] Port LoopLoadElimination to the new pass manager and wire it into
the main pipeline.

This is a very straight forward port. Nothing weird or surprising.

This brings the number of missing passes from the new PM's pipeline down
to three.

llvm-svn: 293249
2017-01-27 01:32:26 +00:00
Quentin Colombet 89dbea06f1 [ARM][LegalizerInfo] Specify the type of the opcode.
This is to fix the win7 bot that does not seem to be very
good at infering the type when it gets used in an initiliazer list.

llvm-svn: 293248
2017-01-27 01:30:46 +00:00
Weiming Zhao 68e20da3f9 [Builtin][ARM] Add Thumb1 support for aeabi_c{f,d}cmp.S and dcmp.S
Reviewers: compnerd, rengolin

Reviewed By: rengolin

Subscribers: aemerson, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D28985

llvm-svn: 293247
2017-01-27 01:21:00 +00:00
Quentin Colombet 24203cf997 [AArch64][LegalizerInfo] Specify the type of the opcode.
This is an attempt to fix the win7 bot that does not seem to be very
good at infering the type when it gets used in an initiliazer list.

llvm-svn: 293246
2017-01-27 01:13:30 +00:00
Quentin Colombet e15e460c05 Revert "[AArch64][LegalizerInfo] Specify the type of the initialization list."
This reverts commit r293238.
Even with that the win7 bot is still failing:
http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/3862

llvm-svn: 293245
2017-01-27 01:13:25 +00:00
Justin Lebar 698c31b8db [NVPTX] Upgrade NVVM intrinsics in InstCombineCalls.
Summary:
There are many NVVM intrinsics that we can't entirely get rid of, but
that nonetheless often correspond to target-generic LLVM intrinsics.

For example, if flush denormals to zero (ftz) is enabled, we can convert
@llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32.  On the other hand, if ftz is
disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
non-ftz PTX instruction.  In this case, we can, however, simplify the
non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.

These transformations are particularly useful because they let us
constant fold instructions that appear in libdevice, the bitcode library
that ships with CUDA and essentially functions as its libm.

Reviewers: tra

Subscribers: hfinkel, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D28794

llvm-svn: 293244
2017-01-27 00:58:58 +00:00