I thought for a while about how to remove it, but it looks like we
can just copy the file for now. Of course I'm not happy about that,
but it's just less than 50 lines of code, and we already have
duplicate code in Error.h and some other places. I want to solve
them all at once later.
Differential Revision: https://reviews.llvm.org/D27819
llvm-svn: 290062
This change seems to make LLD 0.6% faster when linking Clang with
debug info. I don't want us to have lots of local optimizations,
but this function is very hot, and the improvement is small but
not negligible, so I think it's worth doing.
llvm-svn: 288757
Some elf producers (dtrace) put this flag in relocation sections and
some (MC) don't. If we don't ignore the flag we end up with multiple
relocation sections poiting to the same section, which we don't
support.
llvm-svn: 288585
When -O0 is specified, we do not do section merging.
Though before this patch several sections were generated instead
of single, what is useless.
Differential revision: https://reviews.llvm.org/D27041
llvm-svn: 288151
The MipsGotSection::getPageEntryOffset calculates index of GOT entry
with a "page" address. Previously this method changes the state
of MipsGotSection because it modifies PageIndexMap field. That leads
to the unpredictable results if getPageEntryOffset called from multiple threads.
The patch makes getPageEntryOffset constant. To do so it calculates GOT
entry index but does not update PageIndexMap field. Later in the
MipsGotSection::writeTo method linker calculates "page" addresses and
writes them to the output.
llvm-svn: 288129
They return new vectors, but at the same time they mutate other vectors,
so returning values doesn't make much sense. We should just mutate two
vectors.
llvm-svn: 287979
Uncompressing section contents and spliting mergeable section contents
into smaller chunks are heavy tasks. They scan entire section contents
and do CPU-intensive tasks such as uncompressing zlib-compressed data
or computing a hash value for each section piece.
Luckily, these tasks are independent to each other, so we can do that
in parallel_for_each. The number of input sections is large (as opposed
to the number of output sections), so there's a large parallelism here.
Actually the current design to call uncompress() and splitIntoPieces()
in batch was chosen with doing this in mind. Basically what we need to
do here is to replace `for` with `parallel_for_each`.
It seems this patch improves latency significantly if linked programs
contain debug info (which in turn contain lots of mergeable strings.)
For example, the latency to link Clang (debug build) improved by 20% on
my machine as shown below. Note that ld.gold took 19.2 seconds to do
the same thing.
Before:
30801.782712 task-clock (msec) # 3.652 CPUs utilized ( +- 2.59% )
104,084 context-switches # 0.003 M/sec ( +- 1.02% )
5,063 cpu-migrations # 0.164 K/sec ( +- 13.66% )
2,528,130 page-faults # 0.082 M/sec ( +- 0.47% )
85,317,809,130 cycles # 2.770 GHz ( +- 2.62% )
67,352,463,373 stalled-cycles-frontend # 78.94% frontend cycles idle ( +- 3.06% )
<not supported> stalled-cycles-backend
44,295,945,493 instructions # 0.52 insns per cycle
# 1.52 stalled cycles per insn ( +- 0.44% )
8,572,384,877 branches # 278.308 M/sec ( +- 0.66% )
141,806,726 branch-misses # 1.65% of all branches ( +- 0.13% )
8.433424003 seconds time elapsed ( +- 1.20% )
After:
35523.764575 task-clock (msec) # 5.265 CPUs utilized ( +- 2.67% )
159,107 context-switches # 0.004 M/sec ( +- 0.48% )
8,123 cpu-migrations # 0.229 K/sec ( +- 23.34% )
2,372,483 page-faults # 0.067 M/sec ( +- 0.36% )
98,395,342,152 cycles # 2.770 GHz ( +- 2.62% )
79,294,670,125 stalled-cycles-frontend # 80.59% frontend cycles idle ( +- 3.03% )
<not supported> stalled-cycles-backend
46,274,151,813 instructions # 0.47 insns per cycle
# 1.71 stalled cycles per insn ( +- 0.47% )
8,987,621,670 branches # 253.003 M/sec ( +- 0.60% )
148,900,624 branch-misses # 1.66% of all branches ( +- 0.27% )
6.747548004 seconds time elapsed ( +- 0.40% )
llvm-svn: 287946
The function was used only within Relocations.cpp, but now we are
using it in many places, so this patch moves it to a file that fits
to the functionality.
llvm-svn: 287943
Offset between beginning of a .got section and _gp symbols used in MIPS
GOT relocations calculations. Usually the expression looks like
VA + Offset - GP, where VA is the .got section address, Offset - offset
of the GOT entry, GP - offset between .got and _gp. Also there two "magic"
symbols _gp_disp and __gnu_local_gp which hold the offset mentioned above.
These symbols might be referenced by MIPS relocations.
Now the linker always defines _gp symbol and uses hardcoded value for
its initialization. So offset between .got and _gp is 0x7ff0. The _gp_disp
and __gnu_local_gp defined if required and initialized by 0x7ff0.
In fact that is not correct because _gp symbol might be defined by a linker
script and holds arbitrary value. In that case we need to use this value
in relocation calculation and initialize _gp_disp and __gnu_local_gp
properly.
The patch fixes the problem and completes fixing the bug #30311.
https://llvm.org/bugs/show_bug.cgi?id=30311
Differential revision: https://reviews.llvm.org/D27036
llvm-svn: 287832
We have different functions to stringize objects to construct
error messages. For InputFile, we have getFilename, and for
InputSection, we have getName. You had to memorize them.
I think this is the case where the function overloading comes in handy.
This patch defines toString() functions that are overloaded for all these
types, so that you just call it in error().
Differential Revision: https://reviews.llvm.org/D27030
llvm-svn: 287787
MergeOutputSection class was a bit hard to use because it provdes
a series of finalize functions that have to be called in a right way
at a right time. It also intereacted with MergeInputSection, and the
logic was somewhat entangled between the two classes.
This patch simplifies it by providing only one finalize function.
Now, all you have to do is to call MergeOutputSection::finalize
when you have added all sections to the output section. Then, it
internally merges strings and initliazes StringPiece objects.
I think this is much easier to understand.
This patch also adds comments.
llvm-svn: 287314
MIPS GOT handling is very different from other targets so it is better
to keep the code in the separatre section class MipsGotSection. This
patch introduces the new section and moves all MIPS specific code from
GotSection to the new class. I did not rename fields and methods in the
MipsGotSection class to reduce the diff and plan to do that by the
separate commit.
Differential revision: https://reviews.llvm.org/D26733
llvm-svn: 287150
Relocations are the last thing that we wore storing a raw section
pointer to and parsing on demand.
With this patch we parse it only once and store a pointer to the
actual data.
The patch also changes where we store it. It is now in
InputSectionBase. Not all sections have relocations, but most do and
this simplifies the logic. It also means that we now only support one
relocation section per section. Given that that constraint is
maintained even with -r with gold bfd and lld, I think it is OK.
llvm-svn: 286459
Previously, we have both input and output section for .MIPS.abiflags.
Now we have only one class for .MIPS.abiflags, which is MipsAbiFlagsSection.
This class is a synthetic input section.
.MIPS.abiflags sections are handled as regular sections until
the control reaches Writer. Writer then aggregates all sections
whose type is SHT_MIPS_ABIFLAGS to create a single synthesized
input section. The synthesized section is then processed normally
as if it came from an input file.
llvm-svn: 286398
Previously, we have both input and output sections for .reginfo and
.MIPS.options. Now for each such sections we have one synthetic input
sections: MipsReginfoSection and MipsOptionsSection respectively.
Both sections are handled as regular sections until the control reaches
Writer. Writer then aggregates all sections whose type is SHT_MIPS_REGINFO
or SHT_MIPS_OPTIONS to create a single synthesized input section. In that
moment Writer also save GP0 value to the MipsGp0 field of the corresponding
ObjectFile. This value required for R_MIPS_GPREL16 and R_MIPS_GPREL32
relocations calculation.
Differential revision: https://reviews.llvm.org/D26444
llvm-svn: 286397
The ARM 32 and 64-bit ABI does not use 0 for undefined weak references
that are used in PC relative relocations. In particular:
- A branch relocation to an undefined weak resolves to the next
instruction. Effectively making the branch a no-op
- In all other cases the symbol resolves to the place so that S + A - P
resolves to A.
Differential Revision: https://reviews.llvm.org/D26240
llvm-svn: 286353
This is similar to what was done for InputSection.
With this the various fields are stored in host order and only
converted to target order when writing.
llvm-svn: 286327
A CommonInputSection is a section containing all common symbols.
That was an input section but was abstracted in a different way
than the synthetic input sections because it was written before
the synthetic input section was invented.
This patch rewrites CommonInputSection as a synthetic input section
so that it behaves better with other sections.
llvm-svn: 286053
We are going to have many more classes for linker-synthesized
input sections, so it's worth to be added to a separate file
than to the file for regular input sections.
llvm-svn: 285740
The example reported in PR30793 shows a case where gc reclaims
a SHF_TLS section, but it doesn't reclaim the section containing
the debug info for it.
This is expected, as we do not reclaim non-alloc sections
during the garbage collection phase (and this is not going to
change anytime soon, at least this is what I gathered last I
talked with Rafael about it).
So, we end up with a pending reference, thinking that the input
was invalid (which is not true, as it's GC that removed the
SHT_TLS section, and therefore didn't create the PT_TLS *segment*
for it). In cases like this, just assign a VA of zero at relocation
time instead of error'ing out (this is what gold does as well, FWIW).
Differential Revision: https://reviews.llvm.org/D26201
llvm-svn: 285735
Instead of storing a pointer, store the members we need.
The reason for doing this is that it makes it far easier to create
synthetic sections. It also avoids reading data from files multiple
times., which might help with cross endian linking and host
architectures with slow unaligned access.
There are obvious compacting opportunities, but this already has mixed
results even on native x86_64 linking.
There is also the possibility of better refactoring the code for
handling common symbols, but this already shows that a custom class is
not necessary.
llvm-svn: 285148
We were fairly inconsistent as to what information should be accessed
with getSectionHdr and what information (like alignment) was stored
elsewhere.
Now all section info has a dedicated getter. The code is also a bit
more compact.
llvm-svn: 285079
Some MIPS relocations used to access GOT entries are able to manipulate
16-bit index. The other ones like R_MIPS_CALL_HI16/LO16 can handle
32-bit indexes. 16-bit relocations are generated by default. The 32-bit
relocations are generated by -mxgot flag passed to compiler. Usually
these relocation are not mixed in the same code but files like crt*.o
contain 16-bit relocations so even if all "user's" code compiled with
-mxgot flag a few 16-bit relocations might come to the linking phase.
Now LLD does not differentiate local GOT entries accessed via a 16-bit
and 32-bit indexes. That might lead to relocation's overflow if 16-bit
entries are allocated to far from the beginning of the GOT.
The patch introduces new "part" of MIPS GOT dedicated to the local GOT
entries accessed by 32-bit relocations. That allows to put local GOT
entries accessed via a 16-bit index first and escape relocation's overflow.
Differential revision: https://reviews.llvm.org/D25833
llvm-svn: 284809
The R_ARM_PREL31 and R_ARM_NONE relocations should not be faulted in
shared libraries. In the case of R_ARM_NONE, we have moved the TLS
relaxation hint instruction to R_TLSDESC_CALL so that R_HINT can be used
without side-effects. In the case of R_ARM_PREL31 we permit it to be used
against PLT entries as the personality routines are imported when used in
shared libraries.
Differential Revision: https://reviews.llvm.org/D25721
llvm-svn: 284710
Even with the hash table cache, binary search was still pretty
hot. This can be made even faster with prefetching.
Idea from http://cglab.ca/~morin/misc/arraylayout-v2/
I will suggest moving this to llvm.
llvm-svn: 284594
Summary:
Reclaiming the name 'CachedHashString' will let us add a type with that
name that owns its value.
Reviewers: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25644
llvm-svn: 284434
Previously, we supported only SHF_COMPRESSED sections because it's
new and it's the ELF standard. But there are object files compressed
in the GNU style out there, so we had to support it.
Sections compressed in the GNU style start with ".zdebug_" and
contain different headers than the ELF standard's one. In this
patch, getRawCompressedData is responsible to handle it.
A tricky thing about GNU-style compressed sections is that we have
to rename them when creating output sections. ".zdebug_" prefix
implies the section is compressed. We need to rename ".zdebug_"
".debug" because our output sections are not compressed.
We do that in this patch.
llvm-svn: 284068
The .ARM.exidx sections contain a table. Each entry has two fields:
- PREL31 offset to the function the table entry describes
- Action to take, either cantunwind, inline unwind, or PREL31 offset to
.ARM.extab section
The table entries must be sorted in order of the virtual addresses the
first entry of the table describes. Traditionally this is implemented by
the SHF_LINK_ORDER dependency. Instead of implementing this directly we
sort the table entries post relocation.
The .ARM.exidx OutputSection is described by the PT_ARM_EXIDX program
header
Differential revision: https://reviews.llvm.org/D25127
llvm-svn: 283730
I found that this check still may be useful in some cases.
At fact since we use uint32_t alignment, then maximum value
that is valid for us is 0x80000000. But some broken files,
for example file from testcase may have greater value.
Because of that offset calculation overflow and crash happens.
Differential revision: https://reviews.llvm.org/D25324
llvm-svn: 283544
This spreads out computing the hash and using it in a hash table. The
speedups are:
firefox
master 6.811232891
patch 6.559280249 1.03841162939x faster
chromium
master 4.369323666
patch 4.33171853 1.00868134338x faster
chromium fast
master 1.856679971
patch 1.850617741 1.00327578725x faster
the gold plugin
master 0.32917962
patch 0.325711944 1.01064645023x faster
clang
master 0.558015452
patch 0.550284165 1.01404962652x faster
llvm-as
master 0.032563515
patch 0.032152077 1.01279662275x faster
the gold plugin fsds
master 0.356221362
patch 0.352772162 1.00977741549x faster
clang fsds
master 0.635096494
patch 0.627249229 1.01251060127x faster
llvm-as fsds
master 0.030183188
patch 0.029889544 1.00982430511x faster
scylla
master 3.071448906
patch 2.938484138 1.04524944215x faster
This seems to be because we don't stall as much. When linking firefox
stalled-cycles-frontend goes from 57.56% to 55.55%.
With -O2 the difference is even more significant since we avoid
recomputing the hash. For firefox we go from 9.990295265 to
9.149627521 seconds (1.09x faster).
llvm-svn: 283367
It is pretty easy to get the data from the InputSection, so we don't
have to store it.
This opens the way for storing the hash instead.
llvm-svn: 283357
Previously lld would hang in infinite loop in this case,
patch fixes the issue. Object was found during AFL run.
Differential revision: https://reviews.llvm.org/D25229
llvm-svn: 283208
Follow-up to r282716. Reject input files with non-zero GP0 value only in
case of relocatable object generation. In other case we can handle
arbitrary GP0 value so it does not have a sense to make the restriction
so wide.
llvm-svn: 283194
Case was revealed by id_000010,sig_08,src_000000,op_havoc,rep_4 from PR30540.
Out implementation uses uint32 for storing section alignment value,
what seems reasonable, though if value exceeds 32 bits bounds we have
truncation and final value of 0.
Patch fixes the issue.
Differential revision: https://reviews.llvm.org/D25082
llvm-svn: 283097
We would crash when a non-alloca section pointed to a gced part of a
merge section.
That can happen when a C/c++ constant in put in a merge section and
debug info is present.
llvm-svn: 282845
LLD does not update relocations addends when generate a relocatable
object. That is why we should not write a non-zero GP0 value into
the .reginfo and .MIPS.options sections. And we should not accept input
object files with non-zero GP0 value because we cannot handle them
properly.
llvm-svn: 282716
If we pass --gc-sections to lld and .tbss is not referenced,
the section is reclaimed and lld doesn't create a TLS program header.
R_TLS tries to access the program header -> lld crashes.
Mimic what bfd/gold do in this case and resolve a weak undefined
TLS symbol to the base of the TLS block, i.e. give it a value of zero.
Differential Revision: https://reviews.llvm.org/D24832
llvm-svn: 282279
This simplifies error handling as there is now only one place in the
code that needs to consider the possibility that the name is
corrupted. Before we would do it in every access.
llvm-svn: 280937
Previously we used LayoutInputSection class to correctly assign
symbols defined in linker script. This patch removes it and uses
pointer to preceding input section in SymbolAssignment class instead.
Differential revision: https://reviews.llvm.org/D23661
llvm-svn: 280348
This is fix for PR28976.
Problem was that in scanRelocs, we computed relocation offset too early
for case when linkerscript was used. Patch fixes the issue
delaying the calculation.
Differential revision: https://reviews.llvm.org/D23655
llvm-svn: 279264
This section supersedes .reginfo and .MIPS.options sections. But for now
we have to support all three sections for ABI transition period.
llvm-svn: 278482
All other singleton instances are accessible globally.
CommonInputSection shouldn't be an exception.
Differential Revision: https://reviews.llvm.org/D22935
llvm-svn: 277034
Not all relocations from a .eh_frame that point to an executable
section should be ignored. In particular, the relocation finding the
personality function should not.
This is a reduction from trying to bootstrap a static lld on linux.
llvm-svn: 276329
We no longer need it for relocations in .eh_frame.
The only relocations that point to .eh_frame are the ones trying to
find the output .eh_frame.
This actually fixes a bug in the symbol value code. It was not
handling -1 as an indicator for a piece not being included in the
output.
llvm-svn: 276175
We will need to do something like this to support range extension
thunks since that process is iterative.
Doing this also has the advantage that when doing the regular
relocation scan the offset in the output section is known and we can
just store that. This reduces the number of times we have to run
getOffset and I think will allow a more specialized .eh_frame
representation.
By itself this is already a performance win.
firefox
master 7.295045737
patch 7.209466989 0.98826892235
chromium
master 4.531254468
patch 4.509221804 0.995137623774
chromium fast
master 1.836928973
patch 1.823805241 0.992855612714
the gold plugin
master 0.379768791
patch 0.380043405 1.00072310839
clang
master 0.642698284
patch 0.642215663 0.999249070657
llvm-as
master 0.036665467
patch 0.036456225 0.994293213284
the gold plugin fsds
master 0.40395817
patch 0.404384555 1.0010555177
clang fsds
master 0.722045545
patch 0.720946135 0.998477367518
llvm-as fsds
master 0.03292646
patch 0.032759965 0.994943428477
scylla
master 3.427376378
patch 3.368316181 0.98276810292
llvm-svn: 276146
Creating sections on linkerscript side requires some methods
that can be reused if are exported from writer.
Patch implements that change.
Differential revision: http://reviews.llvm.org/D20104
llvm-svn: 275162
The TinyPtrVector of const Thunk<ELFT>* in InputSections.h can cause
build failures on certain compiler/library combinations when Thunk<ELFT>
is not a complete type or is an abstract class. Fixed by making Thunk<ELFT>
non Abstract.
type or is an abstract class
llvm-svn: 274863
Generalise the Mips LA25 Thunk code and implement ARM and Thumb
interworking Thunks.
- Introduce a new module Thunks.cpp to store the Target Specific Thunk
implementations.
- DefinedRegular and Shared have a ThunkData field to record Thunk.
- A Target can have more than one type of Thunk.
- Support PC-relative calls to Thunks.
- Support Thunks to PLT entries.
- Existing Mips LA25 Thunk code integrated.
- Support for ARMv7A interworking Thunks.
Limitations:
- Only one Thunk per SymbolBody, this is sufficient for all currently
implemented Thunks.
- ARM thunks assume presence of V6T2 MOVT and MOVW instructions.
Differential revision: http://reviews.llvm.org/D21891
llvm-svn: 274836
Previously, ch_size was read in host byte order, so if a host and
a target are different in byte order, we would produce a corrupted
output.
llvm-svn: 274729
Patch implements support of zlib style compressed sections.
SHF_COMPRESSED flag is used to recognize that decompression is required.
After that decompression is performed and flag is removed from output.
Differential revision: http://reviews.llvm.org/D20272
llvm-svn: 273661
The patch adds one more partition to the MIPS GOT. This time it is for
TLS related GOT entries. Such entries are located after 'local' and 'global'
ones. We cannot get a final offset for these entries at the time of
creation because we do not know size of 'local' and 'global' partitions.
So we have to adjust the offset later using `getMipsTlsOffset()` method.
All MIPS TLS relocations which need GOT entries operates MIPS style GOT
offset - 'offset from the GOT's beginning' - MipsGPOffset constant. That
is why I add new types of relocation expressions.
One more difference from othe ABIs is that the MIPS ABI does not support
any TLS relocation relaxations. I decided to make a separate function
`handleMipsTlsRelocation` and put MIPS TLS relocation handling code
there. It is similar to `handleTlsRelocation` routine and duplicates its
code. But it allows to make the code cleaner and prevent pollution of
the `handleTlsRelocation` by MIPS 'if' statements.
Differential Revision: http://reviews.llvm.org/D21606
llvm-svn: 273569
Peter Smith found while trying to support thunk creation for ARM that
LLD sometimes creates broken thunks for MIPS. The cause of the bug is
that we assign file offsets to input sections too early. We need to
create all sections and then assign section offsets because appending
thunks changes file offsets for all following sections.
This patch separates the pass to assign file offsets from thunk
creation pass. This effectively reverts r265673.
Differential Revision: http://reviews.llvm.org/D21598
llvm-svn: 273532
There are two motivations for this patch. The first one is a preparation
for support MIPS TLS relocations. It might sound like a joke but for GOT
entries related to TLS relocations MIPS ABI uses almost regular approach
with creation of dynamic relocations for each GOT enty etc. But we need
to separate these 'regular' TLS related entries from MIPS specific local
and global parts of GOT. ABI declare simple solution - all TLS related
entries allocated at the end of GOT after local/global parts. The second
motivation it to support GOT relocations for non-preemptible symbols
with addends. If we have more than one GOT relocations against symbol S
with different addends we need to create GOT entries for each unique
Symbol/Addend pairs.
So we store all MIPS GOT entries in separate containers. For non-preemptible
symbols we have to maintain two data structures. The first one is MipsLocal
vector. Each entry corresponds to the GOT entry from the 'local' part
of the GOT contains the symbol's address plus addend. The second one
is MipsLocalMap. It is a map from Symbol/Addend pair to the GOT index.
Differential Revision: http://reviews.llvm.org/D21297
llvm-svn: 273127
I think it is me who named these variables, but I always find that
they are slightly confusing because align is a verb.
Adding four letters is worth it.
llvm-svn: 272984
This is mostly extracted from http://reviews.llvm.org/D18960.
The general idea for tlsdesc is that the two GD got entries are used
for a function pointer and its argument. The dynamic linker sets
both. In the non-dlopen case the dynamic linker sets the function to
the identity and the argument to the offset in the tls block.
All that the static linker has to do in the non-dlopen case is
relocate the code to point to the got entries and create a dynamic
relocation.
The dlopen case is more complicated, but can be implemented in another patch.
llvm-svn: 271569
Patch implements next relaxation from latest ABI:
"Convert memory operand of test and binop into immediate operand, where binop is one of adc, add, and, cmp, or,
sbb, sub, xor instructions, when position-independent code is disabled."
It is described in System V Application Binary Interface AMD64 Architecture Processor
Supplement Draft Version 0.99.8 (https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdf,
B.2 "B.2 Optimize GOTPCRELX Relocations").
Differential revision: http://reviews.llvm.org/D20793
llvm-svn: 271405
MergedInputSection::getOffset is the busiest function in LLD if string
merging is enabled and input files have lots of mergeable sections.
It is usually the case when creating executable with debug info,
so it is pretty common.
The reason why it is slow is because it has to do faily complex
computations. For non-mergeable sections, section contents are
contiguous in output, so in order to compute an output offset,
we only have to add the output section's base address to an input
offset. But for mergeable strings, section contents are split for
merging, so they are not contigous. We've got to do some lookups.
We used to do binary search on the list of section pieces.
It is slow because I think it's hostile to branch prediction.
This patch replaces it with hash table lookup. Seems it's working
pretty well. Below is "perf stat -r10" output when linking clang
with debug info. In this case this patch speeds up about 4%.
Before:
6584.153205 task-clock (msec) # 1.001 CPUs utilized ( +- 0.09% )
238 context-switches # 0.036 K/sec ( +- 6.59% )
0 cpu-migrations # 0.000 K/sec ( +- 50.92% )
1,067,675 page-faults # 0.162 M/sec ( +- 0.15% )
18,369,931,470 cycles # 2.790 GHz ( +- 0.09% )
9,640,680,143 stalled-cycles-frontend # 52.48% frontend cycles idle ( +- 0.18% )
<not supported> stalled-cycles-backend
21,206,747,787 instructions # 1.15 insns per cycle
# 0.45 stalled cycles per insn ( +- 0.04% )
3,817,398,032 branches # 579.786 M/sec ( +- 0.04% )
132,787,249 branch-misses # 3.48% of all branches ( +- 0.02% )
6.579106511 seconds time elapsed ( +- 0.09% )
After:
6312.317533 task-clock (msec) # 1.001 CPUs utilized ( +- 0.19% )
221 context-switches # 0.035 K/sec ( +- 4.11% )
1 cpu-migrations # 0.000 K/sec ( +- 45.21% )
1,280,775 page-faults # 0.203 M/sec ( +- 0.37% )
17,611,539,150 cycles # 2.790 GHz ( +- 0.19% )
10,285,148,569 stalled-cycles-frontend # 58.40% frontend cycles idle ( +- 0.30% )
<not supported> stalled-cycles-backend
18,794,779,900 instructions # 1.07 insns per cycle
# 0.55 stalled cycles per insn ( +- 0.03% )
3,287,450,865 branches # 520.799 M/sec ( +- 0.03% )
72,259,605 branch-misses # 2.20% of all branches ( +- 0.01% )
6.307411828 seconds time elapsed ( +- 0.19% )
Differential Revision: http://reviews.llvm.org/D20645
llvm-svn: 270999
MIPS .reginfo and .MIPS.options sections are consumed by the linker, and
the linker produces a single output section. But it is possible that
input files contain section symbol points to the corresponding input
section. In case of generation a relocatable output we need to write
such symbols to the output file.
Fixes bug 27878.
Differential Revision: http://reviews.llvm.org/D20688
llvm-svn: 270910
System V Application Binary Interface AMD64 Architecture Processor Supplement Draft Version 0.99.8
(https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-r249.pdf, B.2 "B.2 Optimize GOTPCRELX Relocations")
introduces possible relaxations for R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX.
That patch implements the next relaxation:
mov foo@GOTPCREL(%rip), %reg => lea foo(%rip), %reg
and also opens door for implementing all other ones.
Implementation was suggested by Rafael Ávila de Espíndola with few additions and testcases by myself.
Differential revision: http://reviews.llvm.org/D15779
llvm-svn: 270705
Previously, mergeable section's constructors did more than just
setting member variables; it split section contents into small
pieces. It is not always computationally cheap task because if
the section is a mergeable string section, it needs to scan the
entire section to split them by NUL characters.
If a section would be thrown away by GC, that cost ended up
being a waste of time. It is going to be larger problem if the
section is compressed -- the whole time to uncompress it and
split it up is going to be a waste.
Luckily, we can defer section splitting after GC. We just have
to remember which offsets are in use during GC and apply that later.
This patch implements it.
Differential Revision: http://reviews.llvm.org/D20516
llvm-svn: 270455
This patch adds Size member to SectionPiece so that getRangeAndSize
can just return a SectionPiece instead of a std::pair<SectionPiece *, uint_t>.
Also renamed the function.
llvm-svn: 270346
We were using std::pair to represents pieces of splittable section
contents. It hurt readability because "first" and "second" are not
meaningful. This patch give them names.
One more thing is that piecewise liveness information is stored to
the second element of the pair as a special value of output section
offset. It was confusing, so I defiend a new bit, "Live", in the
new struct.
llvm-svn: 270340
This makes it explicit that each R_RELAX_TLS_* is equivalent to some
other expression.
With this I think we are at a sweet spot for how much is done in
Target.cpp. I did experiment with moving *all* the value math out of it.
It has the advantage that we know the final value in target independent
code, but it gets quite verbose.
llvm-svn: 270277
This adds direct support for computing offsets from the thread pointer
for both variants. Of the architectures we support, variant 1 is used
only by aarch64 (but that doesn't seem to be documented anywhere.)
llvm-svn: 270243
New names reflect purpose of corresponding GOT entries better.
Both expression types related to entries allocated in the 'local'
part of MIPS GOT. R_MIPS_GOT_LOCAL_PAGE is for entries contain 'page'
addresses. R_MIPS_GOT_LOCAL is for entries contain 'full' address.
llvm-svn: 269597
We were previously using an output offset of -1 for both GC'd and tail
merged pieces. We need to distinguish these two cases in order to filter
GC'd symbols from the symbol table -- we were previously asserting when we
asked for the VA of a symbol pointing into a dead piece, which would end
up asking the tail merging string table for an offset even though we hadn't
initialized it properly.
This patch fixes the bug by using an offset of -1 to exclusively mean GC'd
pieces, using 0 for tail merges, and distinguishing the tail merge case from
an offset of 0 by asking the output section whether it is tail merge.
Differential Revision: http://reviews.llvm.org/D19953
llvm-svn: 268604