Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.
amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
some limit
2) It increases the threshold if there are pointers to private arrays(?)
These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.
This way we can remove the custom amdgpu-inline pass.
This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D94153
This patch rename the tablegen generated file ACC.cpp.inc to ACC.inc in order
to match what was done in D92955. This file is included in header file as well as .cpp
file so it make more sense.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D93485
- Merge 6706342f48 -- no more libcxx_needs_site_config, we now
always need it
- Since it was always off in practice, write_config bitrot. Unbitrot
it so that it works
- Remove copy step and let concat step write to final location
immediately -- and fix copy destination directory
As a side effect, libcxx/include/BUILD.gn now has only a single
sources list, which means the cmake sync script should be able to
automatically sync additions and removals of .h files. On the flipside,
this means this file now must be updated after most changes to
libcxx/include/__config_site.in, and looking through the last few months
of changes this looks like it's going to be a wash.
use_lld defaults to true on non-mac if clang_base_path is set (i.e.
the host compiler is a locally-built clang). On mac, the lld Mach-O
port used to be unusable, but ld64.lld.darwinnew is close to usable.
When explicitly setting `use_lld = true` in a GN build on a mac host,
check-lld passes, two check-clang tests fail, and a handful check-llvm
tests fail (the latter all due to -flat_namespace not yet being implemented).
I noticed __availability was missing, so I manually diffed the
file lists and put all recently(ish) added headers:
* __availability from 2eadbc8614
* concepts from 601f763182
* execution from 0a06eb911b
* numbers from 4f6c4b473c
Also remove libcxx_install_support_headers like the CMake build did in
6706342f48, and unconditionally copy
support/win32/{limits_msvc_win32.h,locale_win32.h} like the CMake
build always did as far as I can tell.
This will hopefully fix the build not becoming clean when using Ninja
1.9+. Ninja 1.9 enabled high-resolution time stamps, but pax doesn't
correctly set high-resolution timestamps on its output.
See https://github.com/nico/hack/blob/master/notes/copydir.md for a
detailed writeup of problem and alternatives.
And add them to the pipeline via
AMDGPUTargetMachine::registerPassBuilderCallbacks(), which mirrors
AMDGPUTargetMachine::adjustPassManager().
These passes can't be unconditionally added to PassRegistry.def since
they are only present when the AMDGPU backend is enabled. And there are
no target-specific headers in llvm/include, so parsing these pass names
must occur somewhere in the AMDGPU directory. I decided the best place
was inside the TargetMachine, since the PassBuilder invokes
TargetMachine::registerPassBuilderCallbacks() anyway. If we come up with
a cleaner solution for target-specific passes in the future that's fine,
but there aren't too many target-specific IR passes living in
target-specific directories so it shouldn't be too bad to change in the
future.
Reviewed By: ychen, arsenm
Differential Revision: https://reviews.llvm.org/D93863
For full-debug-info (is_debug=true / symbol_level=2 builds), this makes
linking 15% slower, but gdb startup 1500% faster (for lld: link time
3.9s->4.4s, gdb load time >30s->2s).
For link time, I ran
bench.py -o {noindex,index}.txt \
sh -c 'rm out/gn/bin/lld && ninja -C out/gn lld'
and then `ministat noindex.txt index.txt`:
```
x noindex.txt
+ index.txt
N Min Max Median Avg Stddev
x 5 3.784461 4.0200169 3.8452811 3.8754988 0.089902595
+ 5 4.32496 4.6058481 4.3361208 4.4141198 0.12288267
Difference at 95.0% confidence
0.538621 +/- 0.15702
13.8981% +/- 4.05161%
(Student's t, pooled s = 0.107663)
```
For gdb load time I loaded the crash in PR48392 with
gdb -ex r --args ../out/gn/bin/ld64.lld.darwinnew @response.txt
and just stopped the time until the crash got displayed with a stopwatch
a few times. So the speedup there is less precise, but it's so
pronounced that that's ok (loads ~instantly with the patch, takes a very
long time without it).
Only doing this for LLD because I haven't tried it with other linkers.
Differential Revision: https://reviews.llvm.org/D92844
is_debug by default makes symbol_level = 2 and !is_debug means by
default symbol_level = 0.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92958
Add mir-check-debug pass to check MIR-level debug info.
For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D91595