The default behavior of bugpoint is to print "<crash>" when it finds a reduced
test that crashes compilation. With this flag we now can see the output of the
crashing program. This is useful to make sure it is the same error being
tracked down and not a different error that happens to crash the compiler as
well.
Differential Revision: https://reviews.llvm.org/D22411
llvm-svn: 275646
Summary:
This patch replaces the CUDA specific action by a generic offload action. The offload action may have multiple dependences classier in “host” and “device”. The way this generic offloading action is used is very similar to what is done today by the CUDA implementation: it is used to set a specific toolchain and architecture to its dependences during the generation of jobs.
This patch also proposes propagating the offloading information through the action graph so that that information can be easily retrieved at any time during the generation of commands. This allows e.g. the "clang tool” to evaluate whether CUDA should be supported for the device or host and ptas to easily retrieve the target architecture.
This is an example of how the action graphs would look like (compilation of a single CUDA file with two GPU architectures)
```
0: input, "cudatests.cu", cuda, (host-cuda)
1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
2: compiler, {1}, ir, (host-cuda)
3: input, "cudatests.cu", cuda, (device-cuda, sm_35)
4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_35)
5: compiler, {4}, ir, (device-cuda, sm_35)
6: backend, {5}, assembler, (device-cuda, sm_35)
7: assembler, {6}, object, (device-cuda, sm_35)
8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {7}, object
9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {6}, assembler
10: input, "cudatests.cu", cuda, (device-cuda, sm_37)
11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_37)
12: compiler, {11}, ir, (device-cuda, sm_37)
13: backend, {12}, assembler, (device-cuda, sm_37)
14: assembler, {13}, object, (device-cuda, sm_37)
15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {14}, object
16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_37)" {13}, assembler
17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
18: offload, "host-cuda (powerpc64le-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
19: backend, {18}, assembler
20: assembler, {19}, object
21: input, "cuda", object
22: input, "cudart", object
23: linker, {20, 21, 22}, image
```
The changes in this patch pass the existent regression tests (keeps the existent functionality) and resulting binaries execute correctly in a Power8+K40 machine.
Reviewers: echristo, hfinkel, jlebar, ABataev, tra
Subscribers: guansong, andreybokhanko, tcramer, mkuron, cfe-commits, arpith-jacob, carlo.bertolli, caomhin
Differential Revision: https://reviews.llvm.org/D18171
llvm-svn: 275645
* include CheckAtomic to set HAVE_CXX_ATOMICS64_WITHOUT_LIB properly (introduced in r274121)
* hint Clang CMake files for LLVM CMake files location (inctroduced in ~ r274176)
Differential revision: https://reviews.llvm.org/D22322
llvm-svn: 275641
Add an option to specify a symbol demangler (as well as options to the
demangler). This can be used to make reports more human-readable.
This option is especially useful in -output-dir mode, since it isn't as
easy to manually pipe reports into a demangler in this mode.
llvm-svn: 275640
In this situation:
%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2
The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.
llvm-svn: 275635
Previously, we would expand:
%BL<def> = COPY %DL<kill>, %EBX<imp-use,kill>, %EBX<imp-def>
Into:
%BL<def> = MOV8rr %DL<kill>, %EBX<imp-def>
Dropping the imp-use on the floor.
That confused CriticalAntiDepBreaker, which (correctly) assumes that if an
instruction defs but doesn't use a register, that register is dead immediately
before the instruction - while in this case, the high lanes of EBX can be very
much alive.
This fixes PR28560.
Differential Revision: https://reviews.llvm.org/D22425
llvm-svn: 275634
The same value for EM_BPF is being propagated to glibc,
elfutils, and binutils.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 275633
Block 1 and 2 of an MSF file are bit vectors that represent the
list of blocks allocated and free in the file. We had been using
these blocks to write stream data and other data, so we mark them
as the free page map now. We don't yet serialize these pages to
the disk, but at least we make a note of what it is, and avoid
writing random data to them.
Doing this also necessitated cleaning up some of the tests to be
more general and hardcode fewer values, which is nice.
llvm-svn: 275629
Previously we would read a PDB, then write some of it back out,
but write the directory, super block, and other pertinent metadata
back out unchanged. This generates incorrect PDBs since the amount
of data written was not always the same as the amount of data read.
This patch changes things to use the newly introduced `MsfBuilder`
class to write out a correct and accurate set of Msf metadata for
the data *actually* written, which opens up the door for adding and
removing type records, symbol records, and other types of data to
an existing PDB.
llvm-svn: 275627
The Hexagon schedulers need to handle instructions with a latency
of 0 or 2 more accurately. The problem, in v60, is that a dependence
between two instructions with a 2 cycle latency can use a .cur version
of the source to achieve a 0 cycle latency when the use is in the
same packet. Any othe use, must be at least 2 packets later, or a
stall occurs. In other words, the compiler does not want to schedule
the dependent instructions 1 cycle later.
To achieve this, the latency adjustment code allows only a single
dependence to have a zero latency. All other instructions have the
other value, which is typically 2 cycles. We use a heuristic to
determine which instruction gets the 0 latency.
The Hexagon machine scheduler was also changed to increase the cost
associated with 0 latency dependences than can be scheduled in the
same packet.
Patch by Brendon Cahoon.
llvm-svn: 275625
Remove unnecessary clutter in assembly output. When using SjLj EH, the CFI is
not actually used for anything. Do not emit the CFI needlessly. The minor test
adjustments are interesting. The prologue test was just overzealous matcching.
The interesting case is the LSDA change. It was originally added to ensure that
various compilations did not mangle the name (it explicitly checked the name!).
However, subsequent cleanups made it more reliant on the CFI to find the name.
Parse the generated code flow to generically find the label still.
llvm-svn: 275614
Summary:
When a pass tries to keep LCSSA form it's often convenient to be able to update
LCSSA for a set of instructions rather than for the entire loop. This patch makes the
processInstruction from LCSSA externally available under a name
formLCSSAForInstruction.
Reviewers: chandlerc, sanjoy, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D22378
llvm-svn: 275613
they're redeclarations. This is necessary in order for name lookup to correctly
find the most recent declaration of the name (which affects default template
argument lookup and cross-module merging, among other things).
llvm-svn: 275612
Extend the __declspec(dll*) attribute to cover ObjC interfaces. This was
requested by Microsoft for their ObjC support. Cover both import and export.
This only adds the semantic analysis portion of the support, code-generation
still remains outstanding. Add some basic initial documentation on the
attributes that were previously empty. Tweak the previous tests to use the
relative expected-warnings to make the tests easier to read.
llvm-svn: 275610
This mostly just works.
Vectorcall rets are still not supported.
The win64_eh test change is because fast isel doesn't use rsi for temporary
computations, so it doesn't need to be pushed. The test case I'm changing was
originally added to test pushes, but by now there are other test cases in that
file exercising that code path.
https://reviews.llvm.org/D22422
llvm-svn: 275607
This patch adds proper handling of stratified attributes into our
anders-style CFLAA implementation. It also comes bundled with more
CFLAnders tests. :)
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22325
llvm-svn: 275604
This adds an incomplete anders-style implementation for CFLAA. It's
incomplete in that it's missing interprocedural analysis, attrs
handling, etc. and that it needs more tests. More tests and features
will be added in future commits.
Patch by Jia Chen.
Differential Revision: https://reviews.llvm.org/D22291
llvm-svn: 275602