Summary:
This is the first commit for the new Mach-O backend, designed to roughly
follow the architecture of the existing ELF and COFF backends, and
building off work that @ruiu and @pcc did in a branch a while back. Note
that this is a very stripped-down commit with the bare minimum of
functionality for ease of review. We'll be following up with more diffs
soon.
Currently, we're able to generate a simple "Hello World!" executable
that runs on OS X Catalina (and possibly on earlier OS X versions; I
haven't tested them). (This executable can be obtained by compiling
`test/MachO/relocations.s`.) We're mocking out a few load commands to
achieve this -- for example, we can't load dynamic libraries, but
Catalina requires binaries to be linked against `dyld`, so we hardcode
the emission of a `LC_LOAD_DYLIB` command. Other mocked out load
commands include LC_SYMTAB and LC_DYSYMTAB.
Differential Revision: https://reviews.llvm.org/D75382
The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.
There 2 problems blocked the duplication:
1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts.
2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.
The solutions:
1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.
Differential Revision: https://reviews.llvm.org/D76539
We have loads preserving low and high 16 bits of their
destinations. However, we always use a whole 32 bit register
for these. The same happens with 16 bit stores, we have to
use full 32 bit register so if high bits are clobbered the
register needs to be copied. One example of such code is
added to the load-hi16.ll.
The proper solution to the problem is to define 16 bit subregs
and use them in the operations which do not read another half
of a VGPR or preserve it if the VGPR is written.
This patch simply defines subregisters and register classes.
At the moment there should be no difference in code generation.
A lot more work is needed to actually use these new register
classes. Therefore, there are no new tests at this time.
Register weight calculation has changed with new subregs so
appropriate changes were made to keep all calculations just
as they are now, especially calculations of register pressure.
Differential Revision: https://reviews.llvm.org/D74873
Consider a callee function that has a call (C) within it which feeds
into the return. When we inline that callee into a callsite that has
return attributes, we can backward propagate those attributes to the
call (C) within that inlined callee body.
This is safe to do so only if we can guarantee transfer of execution to
successor in the window of instructions between return value (i.e. the
call C) and the return instruction.
See added test cases.
Reviewed-By: reames, jdoerfert
Differential Revision: https://reviews.llvm.org/D76140
Summary:
//reviews.llvm.org/D33035 added in 2017 basic support for intel-pt. I
plan to improve it and use it to support reverse debugging.
I fixed a couple of issues and now this plugin works again:
1. pythonlib needed to be linked against it for the SB framework.
Linking was failing because of this
2. the decoding functionality was broken because it lacked handling for
instruction events. It seems old versions of libipt, the actual decoding
library, didn't require these, but modern version require it (you can
read more here
https://github.com/intel/libipt/blob/master/doc/howto_libipt.md). These
events signal overflows of the internal PT buffer in the CPU,
enable/disable events of tracing, async cpu events, interrupts, etc.
I ended up refactoring a little bit the code to reduce code duplication.
In another diff I'll implement some basic tests.
This is a simple execution of the library:
(lldb) target create "/data/users/wallace/rr-project/a.out"
Current executable set to '/data/users/wallace/rr-project/a.out' (x86_64).
(lldb) plugin load liblldbIntelFeatures.so
(lldb) b main
Breakpoint 1: where = a.out`main + 8 at test.cpp:10, address = 0x00000000004007fa
(lldb) b test.cpp:14
Breakpoint 2: where = a.out`main + 50 at test.cpp:14, address = 0x0000000000400824
(lldb) r
Process 902754 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
frame #0: 0x00000000004007fa a.out`main at test.cpp:10
7 }
8
9 int main() {
-> 10 int z = 0;
11 for(int i = 0; i < 10000; i++)
12 z += fun(z);
13
Process 902754 launched: '/data/users/wallace/rr-project/a.out' (x86_64)
(lldb) processor-trace start all
(lldb) c
Process 902754 resuming
Process 902754 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 2.1
frame #0: 0x0000000000400824 a.out`main at test.cpp:14
11 for(int i = 0; i < 10000; i++)
12 z += fun(z);
13
-> 14 cout << z<< endl;
15 return 0;
16 }
(lldb) processor-trace show-instr-log
thread #1: tid=902754
0x7ffff72299b9 <+9>: addq $0x8, %rsp
0x7ffff72299bd <+13>: retq
0x4007ed <+16>: addl $0x1, %eax
0x4007f0 <+19>: leave
0x4007f1 <+20>: retq
0x400814 <+34>: addl %eax, -0x4(%rbp)
0x400817 <+37>: addl $0x1, -0x8(%rbp)
0x40081b <+41>: cmpl $0x270f, -0x8(%rbp) ; imm = 0x270F
0x400822 <+48>: jle 0x40080a ; <+24> at test.cpp:12
0x400822 <+48>: jle 0x40080a ; <+24> at test.cpp:12
```
Subscribers: mgorny, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D76872
Using the ADDITIONAL_COMPILE_FLAGS annotation, it is possible to move
these tests from .sh.cpp to .pass.cpp, making them suitable for running
on remote hosts more easily.
Move lower-affine.mlir from test/Transforms to
test/Conversion/AffineToStandard/. Other related NFC.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D77008
Registers used in any address (as well as in a few other contexts)
have special semantics when a "zero" register is used, which is
why the back-end defines extra register classes ADDR32, ADDR64 etc
to be used to prevent the register allocator from using %r0 there.
However, when writing assembler code "by hand", you sometimes need
to trigger that special semantics. However, currently the AsmParser
will reject %r0 in those places. In some cases it may be possible
to write that instruction differently - but in others it is currently
not possible at all.
This check in AsmParser simply seems overly strict, so this patch
just removes the check completely. This brings the behaviour of
AsmParser in line with the GNU assembler as well.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45092
Make InstCombine aware of the aligned_alloc library function.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Depends on D76970.
Differential Revision: https://reviews.llvm.org/D76971
SBPlatform::GetHostPlatform was missing the reproducer instrumentation
macros. Fixed by running lldb-instr on SBPlatform.cpp:
$ ./bin/lldb-instr ../llvm-project/lldb/source/API/SBPlatform.cpp
Summary:
CGProfilePass is run by default in certain new pass manager optimization pipeline. Assemblers other than llvm as (such as gnu as) cannot recognize the .cgprofile entries generated and emitted from this pass, causing build time error.
This patch adds new options in clang CodeGenOpts and PassBuilder options so that we can turn cgprofile off when not using integrated assembler.
Reviewers: Bigcheese, xur, george.burgess.iv, chandlerc, manojgupta
Reviewed By: manojgupta
Subscribers: manojgupta, void, hiraditya, dexonsmith, llvm-commits, tcwang, llozano
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D62627
Summary:
When doing cross-compilation from Linux to MacOS we don't have
access to have access to `xcodebuild` and therefore need a way
to set the SDK version from the outside.
Fixes https://reviews.llvm.org/D68292#1853594 for me.
Reviewers: delcypher, yln
Reviewed By: delcypher
Subscribers: #julialang, mgorny, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D77026
Summary:
The basic idea is to walk through the concept definition, looking for
t.foo() where t has the constrained type.
In this patch:
- nested types are recognized and offered after ::
- variable/function members are recognized and offered after the correct
dot/arrow/colon trigger
- member functions are recognized (anything directly called). parameter
types are presumed to be the argument types. parameters are unnamed.
- result types are available when a requirement has a type constraint.
These are printed as constraints, except same_as<T> which prints as T.
Not in this patch:
- support for merging/overloading when two locations describe the same member.
The last one wins, for any given name. This is probably important...
- support for nested template members (T::x<int>)
- support for completing members of (instantiations of) template template parameters
Reviewers: nridge, saar.raz
Subscribers: mgrang, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73649
The above change used a binary literal that is not supported in c++11 mode when
using gcc. It was formalized into the c++14 standard and works when using that
mode to compile, so change the script to use c++14 instead.
Reviewed by: dvyukov
Differential Revision: https://reviews.llvm.org/D77111
Existing tiling implementation of Linalg would still work for tiling
the batch dimensions of the convolution op.
Differential Revision: https://reviews.llvm.org/D76637
Summary: this patch preserve information from various places in EarlyCSE into assume bundles.
Reviewers: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76769
--no-threads is a name copied from gold.
gold has --no-thread, --thread-count and several other --thread-count-*.
There are needs to customize the number of threads (running several lld
processes concurrently or customizing the number of LTO threads).
Having a single --threads=N is a straightforward replacement of gold's
--no-threads + --thread-count.
--no-threads is used rarely. So just delete --no-threads instead of
keeping it for compatibility for a while.
If --threads= is specified (ELF,wasm; COFF /threads: is similar),
--thinlto-jobs= defaults to --threads=,
otherwise all available hardware threads are used.
There is currently no way to override a --threads={1,2,...}. It is still
a debate whether we should use --threads=all.
Reviewed By: rnk, aganea
Differential Revision: https://reviews.llvm.org/D76885
This patch fixes a crash that happens on the DWARF expression evaluator
when trying to access the top of the stack while it's empty.
rdar://60512489
Differential Revision: https://reviews.llvm.org/D77108
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
Summary:
Add support for TupleGetOp folding through InsertSlicesOp and ExtractSlicesOp.
Vector-to-vector transformations for unrolling and lowering to hardware vectors
can generate chains of structured vector operations (InsertSlicesOp,
ExtractSlicesOp and ShapeCastOp) between the producer of a hardware vector
value and its consumer. Because InsertSlicesOp, ExtractSlicesOp and ShapeCastOp
are structured, we can track the location (tuple index and vector offsets) of
the consumer vector value through the chain of structured operations to the
producer, enabling a much more powerful producer-consumer fowarding of values
through structured ops and tuple, which in turn enables a more powerful
TupleGetOp folding transformation.
Reviewers: nicolasvasilache, aartbik
Reviewed By: aartbik
Subscribers: grosul1, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76889
This makes it closer to how one would run the tests by hand, and it is
also closer to how the SSHExecutor runs the tests remotely. It also
allows using shell builtins in .sh.cpp tests when using %{exec}.
This patch fixes a crash that happens on the DWARF expression evaluator
when trying to access the top of the stack while it's empty.
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
This is the Waymarking algorithm implemented as an independent utility.
The utility is operating on a range of sequential elements.
First we "tag" the elements, by calling `fillWaymarks`.
Then we can "follow" the tags from every element inside the tagged
range, and reach the "head" (the first element), by calling
`followWaymarks`.
Differential Revision: https://reviews.llvm.org/D74415
Based on the current discussion in https://llvm.org/PR45307, it seems
that it's legitimate for `temp_directory_path()` to return a path with
a trailing slash. Since `p.parent_path()` will never contain a trailing
slash, comparing it to the result of `temp_directory_path()` will fail
depending on whether `temp_directory_path()` returns a trailing slash
or not.
If canLowerByDroppingEvenElements indicates that the shuffle is a N:1 compaction pattern and the inputs are suitably sign/zero extended then we can use a chain of PACKSS/PACKUS to compact.
This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask can recognise PACKSS/PACKUS chains.
Otherwise, trying to reproduce a failing filesystem test by copy-pasting
the command-line used and running that in the shell won't work, because
the shell will eat quoting around the define and we'll end up with a
non-stringized path in the .cpp file.
That way, local lit configuration files don't have to worry about
deep-copying the compiler instance of the test format, which is
arguably an implementation detail.
We pass the config to this method even though it is not used by the
current test format because this allows replacing the current test
format by other test formats that would require the config to add
new compile flags.
This reduces the complexity of our already complex global lit configuration,
and also avoids cluttering the compilation commands for all tests with
things that are only relevant to the filesystem tests.
Differential Revision: https://reviews.llvm.org/D76785
The script now includes extra info about command-line options used
when generating its advertisement heading, but we don't want that
here. This is a special-case because we have enhanced the check
lines (as noted in the 2nd comment line).