Commit Graph

7560 Commits

Author SHA1 Message Date
Rui Ueyama 843233b920 Start using make() in COFF.
We don't want ELF and COFF to diverge too much.

llvm-svn: 289085
2016-12-08 18:31:18 +00:00
Rui Ueyama 520d9169e6 Move Memory.{h,cpp} to lld/Support so that we can use them from COFF.
llvm-svn: 289084
2016-12-08 18:31:13 +00:00
Rafael Espindola d0ebd84c42 Change the implementation of --dynamic-list to use linker script parsing.
The feature is documented as
-----------------------------
The format of the dynamic list is the same as the version node
without scope and node name.  See *note VERSION:: for more
information.
--------------------------------

And indeed qt uses a dynamic list with an 'extern "C++"' in it. With
this patch we support that

The change to gc-sections-shared makes us match bfd. Just because we
kept bar doesn't mean it has to be in the dynamic symbol table.

The changes to invalid-dynamic-list.test and reproduce.s are because
of the new parser.

The changes to version-script.s are the only case where we change
behavior with regards to bfd, but I would like to see a mix of
--version-script and --dynamic-list used in the wild before
complicating the code.

llvm-svn: 289082
2016-12-08 17:54:26 +00:00
Rui Ueyama 6f6d46d497 Use `make` to simplify. NFC.
llvm-svn: 289081
2016-12-08 17:48:52 +00:00
Rui Ueyama 6dbf7ff747 Use make to instantiate Target and LinkerScript. NFC.
llvm-svn: 289079
2016-12-08 17:44:39 +00:00
Rui Ueyama e4eadb6a24 Split LinkerDriver::link. NFC.
llvm-svn: 289078
2016-12-08 17:44:37 +00:00
Rui Ueyama 34bf8677a4 Remove a special handling of AMDGPU entry points.
This is the last peculiar semantics left in the linker. If you want to
always set an entry point to 0, you can pass `-e 0` to the linker.

Differential Revision: https://reviews.llvm.org/D27532

llvm-svn: 289077
2016-12-08 17:32:58 +00:00
Rafael Espindola 7e71415cb3 Add support for 'extern "C"'.
It is used by Qt.

llvm-svn: 289074
2016-12-08 17:26:53 +00:00
Rui Ueyama 8cb6283e74 Make function names shorter. NFC.
llvm-svn: 289072
2016-12-08 17:18:09 +00:00
Rui Ueyama 248e4a344c Do not use template where template is not needed.
Compilers can inline and optimize this code in the same way as template.

llvm-svn: 289071
2016-12-08 17:04:18 +00:00
Rafael Espindola 4737f94132 Make this test more strict. NFC.
llvm-svn: 289069
2016-12-08 16:51:56 +00:00
Rafael Espindola 191390a851 Inline function called only once.
llvm-svn: 289067
2016-12-08 16:26:20 +00:00
Rafael Espindola 361da4cef7 Handle C++ names in anon scripts.
llvm-svn: 289066
2016-12-08 16:20:29 +00:00
Rafael Espindola defdfa86c1 Inline two functions called only once. NFC.
llvm-svn: 289065
2016-12-08 16:02:48 +00:00
Rafael Espindola c65aee64ec Add two helper functions. NFC.
llvm-svn: 289064
2016-12-08 15:56:33 +00:00
Rafael Espindola 39c16dfbce Simplify. NFC.
llvm-svn: 289062
2016-12-08 15:36:58 +00:00
Simon Atanasyan 872764f6fe [ELF] Correct addAbsolute function argument name
Follow-up to r289025.

llvm-svn: 289061
2016-12-08 15:29:17 +00:00
George Rimar c49fd8c477 [ELF] - Read 16 bits for R_386_16/R_386_PC16 relocations instead of 32.
Looks it was theoretically incorrect if the section is at the very end of the file as
reading 32 bits would pass the end of file

llvm-svn: 289046
2016-12-08 13:50:28 +00:00
Peter Smith baffdb8bc2 [ELF] ifunc implementation using synthetic sections
This change introduces new synthetic sections IpltSection, IgotPltSection
that represent the ifunc entries that would previously have been put in
the PltSection and the GotPltSection. The separation makes sure that
the R_*_IRELATIVE relocations are placed after the non R_*_IRELATIVE
relocations, which permits ifunc resolvers to know that the .got.plt
slots will be initialized prior to the resolver being called.

A secondary benefit is that for ARM we can move the IgotPltSection and its
dynamic relocations to the .got and .rel.dyn as the ARM glibc expects all
the R_*_IRELATIVE relocations to be in the .rel.dyn

Differential revision: https://reviews.llvm.org/D27406

llvm-svn: 289045
2016-12-08 12:58:55 +00:00
Simon Atanasyan 6a4eb75c46 [ELF][MIPS] Make _gp, _gp_disp, __gnu_local_gp global symbols
These MIPS specific symbols should be global because in general they can
have an arbitrary value. By default this value is a fixed offset from .got
section.

This patch adds more checks to the mips-gp-local.s test case but marks
it as XFAIL because LLD does not allow redefinition of absolute symbols
value by a linker script. This should be fixed by D27276.

Differential revision: https://reviews.llvm.org/D27524

llvm-svn: 289025
2016-12-08 06:19:47 +00:00
Rafael Espindola 41217616a8 Delete dead code.
Thanks to George Rimar for pointing it out.

llvm-svn: 289020
2016-12-08 03:17:05 +00:00
Rui Ueyama 9a97acc097 Remove redundant call of std::unique_ptr::get.
Obj is an instance of std::unique_ptr, so *Obj.get() is the same as *Obj.

llvm-svn: 288996
2016-12-07 23:26:39 +00:00
Rui Ueyama 332e02a164 Fix Windows buildbots.
clang-format-diff sorted these #include's in the asciibetical order,
but they need to be in this order.

llvm-svn: 288995
2016-12-07 23:24:32 +00:00
Rui Ueyama 4c5b8cea02 Make demangle() return None instead of "" if a given string is not a mangled symbol.
llvm-svn: 288993
2016-12-07 23:17:05 +00:00
Rui Ueyama a45d45e231 COFF: Define overloaded toString functions.
Previously, we had different way to stringize SymbolBody and InputFile
to construct error messages. This patch defines overloaded function
toString() so that we don't need to memorize all these different
function names.

With that change, it is now easy to include demangled names in error
messages. Now, if there is a symbol name conflict, we'll print out
both mangled and demangled names.

llvm-svn: 288992
2016-12-07 23:17:02 +00:00
Rafael Espindola d4db0b3748 Rename MaxPageSize to DefaultMaxPageSize to avoid confusion.
Thanks to Rui for the suggestion.

llvm-svn: 288982
2016-12-07 21:13:27 +00:00
Rui Ueyama 007f9f3e3d Fix Windows buildbots.
llvm-svn: 288975
2016-12-07 20:31:46 +00:00
Rafael Espindola 476d20764a Use the correct MaxPageSize.
Now Target->MaxPageSize is only used as the default value of
Config->MaxPageSize.

llvm-svn: 288974
2016-12-07 20:29:46 +00:00
Rui Ueyama f966fe6c38 Do not pass line number to convertToUnixPathSeparator.
Line number can never contain '/' or '\', so the previous code
was pointless at that point.

llvm-svn: 288973
2016-12-07 20:25:45 +00:00
Rui Ueyama 63d064e9d1 Make convertToUnixPathSeparator return a new string instead of mutating argument.
llvm-svn: 288972
2016-12-07 20:22:27 +00:00
Rafael Espindola 8b8f74f6b7 Simplify. NFC.
llvm-svn: 288971
2016-12-07 20:20:39 +00:00
Rafael Espindola a86a9c6fad Use the correct MaxPageSize.
Found by inspection.

llvm-svn: 288970
2016-12-07 20:10:43 +00:00
George Rimar 49326e4d84 Format. NFC.
llvm-svn: 288967
2016-12-07 19:44:27 +00:00
George Rimar 92c88a4f76 [ELF] - Print absolute file name in errors when possible.
Currently LLD prints basename of source file name in error messages,
for example:
$ mkdir foo
$ echo 'void _start(void) { foobar(); }' > foo/bar.c
$ gcc -g -c foo/bar.c
$ bin/ld.lld -o out bar.o 
bin/ld.lld: error: bar.c:1: undefined symbol 'foobar'
$
This should say:
bin/ld.lld: error: foo/bar.c:1: undefined symbol 'foobar'

This is PR31299

Differential revision: https://reviews.llvm.org/D27506

llvm-svn: 288966
2016-12-07 19:42:25 +00:00
Adhemerval Zanella d719d37156 ELF/AArch64: Refactor R_AARCH64_LDST{8,15,32,64,128}_ABS_LO12_NC Relocations
This patch refactor how to apply the R_AARCH64_LDST{8,16,32,64,128}_ABS_NC
relocations by adding a new function to correct extract the bits expected
by each relocation.  This make is explicit which are the bits range expected
and simplify the code to mask and shift the deriable values.

It also fixes the R_AARCH64_LDST128_ABS_LO12_NC mask, although in pratice
the mask/shift always returns a 16 bytes aligned value.

Checked on AArch64 and with test-suite.

llvm-svn: 288921
2016-12-07 17:31:48 +00:00
Rui Ueyama 4de746f326 Add comments and reorder code a bit to clarify the intention. NFC.
llvm-svn: 288885
2016-12-07 04:45:34 +00:00
Rui Ueyama 9e5f5effe5 Make a decision about whether we should warn on missing entry or not early.
Config->WarnMissingEntry is a single-purpose boolean variable, and
I think it's easier to understand than Config->HasEntry.

llvm-svn: 288883
2016-12-07 04:06:21 +00:00
Rui Ueyama a1407c4fdf Simplify -e <number> option handling.
This patch is to parse the entry symbol name lazily.

llvm-svn: 288882
2016-12-07 03:23:06 +00:00
Petr Hosek 7173340e1f [ELF] Fix the broken PPC test
This is related to the change in handling of entry point symbols.

Differential Revision: https://reviews.llvm.org/D27500

llvm-svn: 288880
2016-12-07 03:04:02 +00:00
Petr Hosek 2f50fef095 [ELF] Shared libraries should have entry point
Shared libraries should have entry set following the same rules as for
regular binaries. The only difference is that in case the default entry
point (_start or __start) isn't found (unless it was set explicitly), we
shouldn't give a warning as in case of regular binaries.

Differential Revision: https://reviews.llvm.org/D27497

llvm-svn: 288878
2016-12-07 02:26:16 +00:00
Petr Hosek 668bebed6d [ELF] Only binaries should have DT_DEBUG entry
The presence of DT_DEBUG entry is unrelated to the existence of entry point.

Differential Revision: https://reviews.llvm.org/D27496

llvm-svn: 288877
2016-12-07 02:05:42 +00:00
George Rimar a2a32c2cc8 [ELF] - Teach LLD to recognize PT_OPENBSD_BOOTDATA
Minor patch to fix PR31288

OpenBSD commit:
d39116912b

Differential revision: https://reviews.llvm.org/D27458

llvm-svn: 288832
2016-12-06 17:57:42 +00:00
Rafael Espindola 074ba93ceb Don't print empty PT_LOAD.
If we do, the freebsd dynamic linker tries to call mmap with a size 0,
which fails.

It is hard to avoid creating them when linker scripts are used, so we
just delete empty PT_LOADs at the end.

llvm-svn: 288808
2016-12-06 13:43:34 +00:00
Rafael Espindola e004d4bfc2 Don't crash trying to write an 0 addend.
For preemptable symbols the dynamic linker does all the work. Trying
to compute the addend is at best wasteful and can also lead to crashes
in cases of programs that uses tls but doesn't define any tls
variables.

llvm-svn: 288803
2016-12-06 12:19:24 +00:00
Rafael Espindola 43c3cb7786 Make the test a bit more strict. NFC.
llvm-svn: 288802
2016-12-06 12:15:12 +00:00
Rui Ueyama c8e6884871 Inline MergeInputSection::getData().
This change seems to make LLD 0.6% faster when linking Clang with
debug info. I don't want us to have lots of local optimizations,
but this function is very hot, and the improvement is small but
not negligible, so I think it's worth doing.

llvm-svn: 288757
2016-12-06 02:19:30 +00:00
Rafael Espindola d207c37806 Test only the relevant bits.
This test only needs to test the Type (SharedObject), the address of
the first PT_LOAD and the presence of PT_DYNAMIC.

llvm-svn: 288719
2016-12-05 22:27:21 +00:00
Rafael Espindola d6b42ee404 Don't check the symbol values is this test.
It only needs to find how many are local.

llvm-svn: 288716
2016-12-05 22:16:32 +00:00
Rui Ueyama c38860b0d2 Revert r288707: Split removeUnusedSyntheticSections into two functions.
That patch broke build.

llvm-svn: 288708
2016-12-05 21:39:35 +00:00
Rui Ueyama 29e7a19e17 Split removeUnusedSyntheticSections into two functions.
llvm-svn: 288707
2016-12-05 21:37:16 +00:00
Rafael Espindola 7b4dd875df Don't check symbol value in this test.
llvm-svn: 288704
2016-12-05 21:02:45 +00:00
Rafael Espindola 35ac8bab7b Don't check the symbol value in this test.
llvm-svn: 288702
2016-12-05 20:56:40 +00:00
Rafael Espindola c5b3334124 Don't check the symbol values.
This test is just about which symbols are in which table.

llvm-svn: 288701
2016-12-05 20:53:11 +00:00
Rafael Espindola fc4ddbbe8f Simplify test. NFC.
llvm-svn: 288700
2016-12-05 20:49:16 +00:00
Rafael Espindola b4e61e9dd0 Make test test what it should be testing.
Looks like the second section in this test was lost along the way.

llvm-svn: 288699
2016-12-05 20:42:58 +00:00
Rui Ueyama 44da9decb5 Include object file name to an error message.
llvm-svn: 288686
2016-12-05 18:40:14 +00:00
Rui Ueyama fcd3fa83ea Use "equivalence class" instead of "color" to describe the concept in ICF.
Also add a citation to GNU gold safe ICF paper.

Differential Revision: https://reviews.llvm.org/D27398

llvm-svn: 288684
2016-12-05 18:11:35 +00:00
Rui Ueyama 6d12eaee8b Remove existing file in a separate thread asynchronously.
On Linux (and probably on other Unix-like systems), unlink(2) is
noticeably slow. It takes 250 milliseconds to remove a 1 GB file
on ext4 filesystem on my machine, whether the file is on SSD or
on a spinning disk.

To create a new result file, we remove existing file first. So, if
you repeatedly link a 1 GB program in a regular compile-link-debug
cycle, every cycle wastes 250 milliseconds only to remove a file.

Since LLD can link a 1 GB in about 5 seconds, that waste actually
matters.

This patch defines `unlinkAsync` function. The function spawns a
background thread to call unlink. The calling thread returns
almost immediately.

Differential Revision: https://reviews.llvm.org/D27295

llvm-svn: 288680
2016-12-05 17:40:37 +00:00
Eugene Leviant 2a942c4b45 [ELF] Print file:line for unknown PHDR error
Differential revision: https://reviews.llvm.org/D27335

llvm-svn: 288678
2016-12-05 16:38:32 +00:00
Adhemerval Zanella a47ba192dc ELF/AArch64: Fix R_AARCH64_LDST16_ABS_LO12_NC mask
The relocation R_AARCH64_LDST16_ABS_LO12_NC should set a ld/st
immediate value to bits [11:1] not [11:2].  This patches fixes it
and adds a testcase for regression.

With this fix all the faulty tests on test-suite (clavm, lencod,
and trimaran) pass.

llvm-svn: 288670
2016-12-05 14:15:44 +00:00
Adhemerval Zanella df310646d8 ELF/AArch64: Simplify R_AARCH64_ADD_ABS_LO12_NC relocation
This patch uses the updateAArch64Add on relocation apply and remove
the comment.

llvm-svn: 288669
2016-12-05 14:15:03 +00:00
Adhemerval Zanella 6afe128ae5 ELF/AArch64: consolidate getAArch64Page implementation
This patch avoid getAArch64Page code duplication by removing the
implementation at InputSection.

llvm-svn: 288668
2016-12-05 14:14:26 +00:00
Rui Ueyama ca17841fc4 Run the last iteration of parallel_for_loop using a threadpool.
Remainders of tasks were ran in the main thread, so parallel_for_each
could theoretically take 2x time than the ideal.

llvm-svn: 288631
2016-12-05 02:07:29 +00:00
Rui Ueyama 5cb712ed3c Simplify ICF alignment handling.
llvm-svn: 288630
2016-12-05 01:31:39 +00:00
Rui Ueyama 045d828158 Re-implement the optimization that I removed in r288527.
I removed a wrong optimization for ICF in r288527. Sean Silva suggested
in a post commit review that the correct algorithm can be implemented
easily. So is this patch.

llvm-svn: 288620
2016-12-04 16:33:13 +00:00
Rafael Espindola 61d052d725 Don't discard .L symbol with -r.
They might be used by relocations.

Fixes pr31252.

llvm-svn: 288617
2016-12-04 08:34:17 +00:00
Rui Ueyama 28f22ae15e Update comment to clarify the machine spec.
llvm-svn: 288609
2016-12-04 02:34:29 +00:00
Rui Ueyama c3aacfd91b Add comments about the use of threads in LLD.
llvm-svn: 288606
2016-12-03 23:35:22 +00:00
Rui Ueyama 244a435ae3 Factor out common code to a header.
llvm-svn: 288599
2016-12-03 21:24:51 +00:00
Rafael Espindola 27004d336f Ignone SHF_INFO_LINK.
Some elf producers (dtrace) put this flag in relocation sections and
some (MC) don't. If we don't ignore the flag we end up with multiple
relocation sections poiting to the same section, which we don't
support.

llvm-svn: 288585
2016-12-03 15:26:18 +00:00
George Rimar 1b3d34a298 [ELF] - Implemented R_386_16 and R_386PC16 relocations
A program or object file using R_386_8, R_386_16, R_386_PC16 or R_386_PC8
relocations is not conformant to latest ABI. The R_386_16, and R_386_8
relocations truncate the computed value to 16 - bits and 8 - bits
respectively. R_386_PC16 and R_386_16 are used by some
applications, for example by FreeBSD loaders.

Previously we did not take addend in account for these relocation,
counting it as 0, what is wrong and was a reason of hangs.

This patch needed for example for FreeBSD pmbr (protective mbr).

Differential revision: https://reviews.llvm.org/D27303

llvm-svn: 288581
2016-12-03 07:30:30 +00:00
George Rimar 40c28c7f9a [ELF] - Change the way how we compute offsets for binary output.
Binary output feature is a bit confuzing. bfd and gold output differs a lot sometimes,
though it is important for FreeBSD mbr loaders.

Patch change the way how we compute file offsets for binary output.
This fixes PR31196.

Previously offsets were calculated basing on offsets and addresses of sections
from the same loads:
if (Sec == First)
  return alignTo(Off, Target->MaxPageSize, Sec->Addr);
return First->Offset + Sec->Addr - First->Addr;

bfd assigns offsets for each section to VA - MinVA:
https://github.com/redox-os/binutils-gdb/blob/master/bfd/binary.c#L27
https://github.com/redox-os/binutils-gdb/blob/master/bfd/binary.c#L255 
(LMA == VA usually)

This patch for now just stops creating phdrs for binary output. 
An effect from this that no any additional calculation for offset is performed:

uintX_t getFileAlignment(uintX_t Off, OutputSectionBase *Sec) {
OutputSectionBase *First = Sec->FirstInPtLoad;
// If the section is not in a PT_LOAD, we have no other constraint.
if (!First)
  return Off; //**First is always null, condition always happens**

That is enough now with combination of another patch to generate output
that is similar to what bfd produce for mbr loader.

Differential revision: https://reviews.llvm.org/D27341

llvm-svn: 288580
2016-12-03 07:23:30 +00:00
George Rimar d250618c3e [ELF] - Disable relro when -omagic specified.
--omagic is an option to create old-fashioned executables in which
.text segments are writable. Today, the option is still in use to
create special-purpose programs such as boot loaders. It doesn't
make sense to create PT_GNU_RELRO for such executables.

DIfferential revision: https://reviews.llvm.org/D27297

llvm-svn: 288579
2016-12-03 07:09:28 +00:00
Rui Ueyama 5419861a52 Remove a wrong performance optimization.
This is a hack for single thread execution. We are using Color[0] and
Color[1] alternately on each iteration. This optimization is to look
at the next slot as opposted to the current slot to get recent results
early. Turns out that the assumption is wrong, because the other slots
are not always have the most recent values, but instead it may have
stale values of the previous iteration. This patch removes that
performance hack.

llvm-svn: 288527
2016-12-02 18:40:43 +00:00
Rui Ueyama 83ec681a5c Removed a wrong assertion about non-colorable sections.
The assertion asserted that colorable sections can never have
a reference to non-colorable sections, but that was simply wrong.
They can have references to non-colorable sections. If that's the
case, referenced sections must be the same in terms of pointer
comparison.

llvm-svn: 288511
2016-12-02 17:23:58 +00:00
Rui Ueyama 3a618e5606 Port parallel ICF to COFF.
LLD used to take 11.73 seconds to link Clang. Now it is 6.94 seconds.
MSVC link takes 83.02 seconds. Note that ICF is enabled by default on
Windows, so a low latency ICF is more important than in ELF.

llvm-svn: 288487
2016-12-02 08:03:58 +00:00
Rafael Espindola 5708b2f8a6 Ignore R_X86_64_NONE.
It looks like the way dtrace works is

* The user creates .o files that reference magical symbol names.
* dtrace reads those files, collecs the info it needs and changes the
  relocation to R_X86_64_NONE expecting the linker to ignore them.

llvm-svn: 288485
2016-12-02 08:00:09 +00:00
Rui Ueyama 27498b5dd5 Fix a bug in ICF involving COFF associative sections.
Associative sections are sections that need to be linked if their associated
sections are linked. Associative sections are used to append auxiliary data
such as debug info.

Previously, we compared all associative sections when comparing two comdat
sections. Because usually assocative sections are not mergeable sections,
we missed a lot of mergeable sections. MSVC linker doesn't seem to check
the identity of associative sections.

This patch makes LLD to ignore associative sections when doing ICF.

llvm-svn: 288483
2016-12-02 07:46:12 +00:00
Rui Ueyama 1b6bab011c Fix the worse case performance of ICF.
r288228 seems to have regressed ICF performance in some cases in which
a lot of sections are actually mergeable. In r288228, I made a change
to create a Range object for each new color group. So every time we
split a group, we allocated and added a new group to a list of groups.

This patch essentially reverted r288228 with an improvement to
parallelize the original algorithm.

Now the ICF main loop is entirely allocation-free and lock-free.

Just like pre-r288228, we search for group boundaries by linear scan
instead of managing the information using Range class. r288228 was
neutral in performance-wise, and so is this patch.

I confirmed that this produces the exact same result as before
using chromium and clang as tests.

llvm-svn: 288480
2016-12-02 05:35:46 +00:00
Rafael Espindola 103fc28961 Add a test documenting how we handle addends on Elf_Rela.
llvm-svn: 288477
2016-12-02 04:20:47 +00:00
Rafael Espindola 858c092daa Allow duplicated abs symbols with the same value.
This is a fairly reasonable bfd extension since there is one obvious value.

dtrace depends on this feature as it creates multiple absolute
symbols with the same value.

llvm-svn: 288461
2016-12-02 02:58:21 +00:00
Rafael Espindola f4ff80c128 Write the addent to got entries when using Elf_Rel.
llvm-svn: 288451
2016-12-02 01:57:24 +00:00
Rui Ueyama 395859bdb7 Fix undefined behavior.
New items can be added to Ranges here, and that invalidates
an iterater that previously pointed the end of the vector.

llvm-svn: 288443
2016-12-02 00:38:15 +00:00
Rui Ueyama a6cd5fe415 Add an assert instead of ignoring an impossible condition.
llvm-svn: 288419
2016-12-01 21:41:06 +00:00
Rui Ueyama 91ae861af5 Updates file comments and variable names.
Use "color" instead of "group id" to describe the ICF algorithm.

llvm-svn: 288409
2016-12-01 19:45:22 +00:00
Rui Ueyama c1835319c9 Parallelize ICF to make LLD's ICF really fast.
ICF is short for Identical Code Folding. It is a size optimization to
identify two or more functions that happened to have the same contents
to merges them. It usually reduces output size by a few percent.

ICF is slow because it is computationally intensive process. I tried
to paralellize it before but failed because I couldn't make a
parallelized version produce consistent outputs. Although it didn't
create broken executables, every invocation of the linker generated
slightly different output, and I couldn't figure out why.

I think I now understand what was going on, and also came up with a
simple algorithm to fix it. So is this patch.

The result is very exciting. Chromium for example has 780,662 input
sections in which 20,774 are reducible by ICF. LLD previously took
7.980 seconds for ICF. Now it finishes in 1.065 seconds.

As a result, LLD can now link a Chromium binary (output size 1.59 GB)
in 10.28 seconds on my machine with ICF enabled. Compared to gold
which takes 40.94 seconds to do the same thing, this is an amazing
number.

From here, I'll describe what we are doing for ICF, what was the
previous problem, and what I did in this patch.

In ICF, two sections are considered identical if they have the same
section flags, section data, and relocations. Relocations are tricky,
becuase two relocations are considered the same if they have the same
relocation type, values, and if they point to the same section _in
terms of ICF_.

Here is an example. If foo and bar defined below are compiled to the
same machine instructions, ICF can (and should) merge the two,
although their relocations point to each other.

  void foo() { bar(); }
  void bar() { foo(); }

This is not an easy problem to solve.

What we are doing in LLD is some sort of coloring algorithm. We color
non-identical sections using different colors repeatedly, and sections
in the same color when the algorithm terminates are considered
identical. Here is the details:

  1. First, we color all sections using their hash values of section
  types, section contents, and numbers of relocations. At this moment,
  relocation targets are not taken into account. We just color
  sections that apparently differ in different colors.

  2. Next, for each color C, we visit sections having color C to see
  if their relocations are the same. Relocations are considered equal
  if their targets have the same color. We then recolor sections that
  have different relocation targets in new colors.

  3. If we recolor some section in step 2, relocations that were
  previously pointing to the same color targets may now be pointing to
  different colors. Therefore, repeat 2 until a convergence is
  obtained.

Step 2 is a heavy operation. For Chromium, the first iteration of step
2 takes 2.882 seconds, and the second iteration takes 1.038 seconds,
and in total it needs 23 iterations.

Parallelizing step 1 is easy because we can color each section
independently. This patch does that.

Parallelizing step 2 is tricky. We could work on each color
independently, but we cannot recolor sections in place, because it
will break the invariance that two possibly-identical sections must
have the same color at any moment.

Consider sections S1, S2, S3, S4 in the same color C, where S1 and S2
are identical, S3 and S4 are identical, but S2 and S3 are not. Thread
A is about to recolor S1 and S2 in C'. After thread A recolor S1 in
C', but before recolor S2 in C', other thread B might observe S1 and
S2. Then thread B will conclude that S1 and S2 are different, and it
will split thread B's sections into smaller groups wrongly. Over-
splitting doesn't produce broken results, but it loses a chance to
merge some identical sections. That was the cause of indeterminism.

To fix the problem, I made sections have two colors, namely current
color and next color. At the beginning of each iteration, both colors
are the same. Each thread reads from current color and writes to next
color. In this way, we can avoid threads from reading partial
results. After each iteration, we flip current and next.

This is a very simple solution and is implemented in less than 50
lines of code.

I tested this patch with Chromium and confirmed that this parallelized
ICF produces the identical output as the non-parallelized one.

Differential Revision: https://reviews.llvm.org/D27247

llvm-svn: 288373
2016-12-01 17:09:04 +00:00
Sean Silva 2eed75926c Add `isRelExprOneOf` helper
In various places in LLD's hot loops, we have expressions of the form
"E == R_FOO || E == R_BAR || ..." (E is a RelExpr).

Some of these expressions are quite long, and even though they usually go just
a very small number of ways and so should be well predicted, they can still
occupy branch predictor resources harming other parts of the code, or they
won't be predicted well if they overflow branch predictor resources or if the
branches are too dense and the branch predictor can't track them all (the
compiler can in theory avoid this, at a cost in text size). And some of these
expressions are so large and executed so frequently that even when
well-predicted they probably still have a nontrivial cost.

This speedup should be pretty portable. The cost of these simple bit tests is
independent of:

- the target we are linking for
- the distribution of RelExpr's for a given link (which can depend on how the
  input files were compiled)
- what compiler was used to compile LLD (it is just a simple bit test;
  hopefully the compiler gets it right!)
- adding new target-dependent relocations (e.g. needsPlt doesn't pay any extra
  cost checking R_PPC_PLT_OPD on x86-64 builds)

I did some rough measurements on clang-fsds and this patch gives over about 4%
speedup for a regular -O1 link, about 2.5% for -O3 --gc-sections and over 5%
for -O0. Sorry, I don't have my current machine set up for doing really
accurate measurements right now.

This also is just a bit cleaner. Thanks for Joerg for suggesting for
this approach.

Differential Revision: https://reviews.llvm.org/D27156

llvm-svn: 288314
2016-12-01 05:43:48 +00:00
Rui Ueyama 10091b0ac2 Simplify ScriptParser.
- Rename currentBuffer -> getCurrentMB to start it with verb.
 - Simplify containsString.
 - Add llvm_unreachable at end of getCurrentMB.

llvm-svn: 288310
2016-12-01 04:36:54 +00:00
Rui Ueyama 3cd22d3104 Do not name a variable Ret which is not a return value.
llvm-svn: 288309
2016-12-01 04:36:51 +00:00
Rui Ueyama b5f1c3ec0c Make get{Line,Column}Number members of StringParser.
This patch also renames currentLocation getCurrentLocation.

llvm-svn: 288308
2016-12-01 04:36:49 +00:00
Rui Ueyama 50fb82743e Split getPos into getLineNumber and getColumnNumber.
llvm-svn: 288306
2016-12-01 03:56:27 +00:00
Rui Ueyama c5cb737584 Dump Codeview type information correctly.
llvm-svn: 288298
2016-12-01 01:22:48 +00:00
Rui Ueyama 9dedfb1fa8 Change how we manage groups in ICF.
Previously, on each iteration in ICF, we scan the entire vector of
input sections to find boundaries of groups having the same ID.

This patch changes the algorithm so that we now have a vector of ranges.
Each range contains a starting index and an ending index of the group.
So we no longer have to search boundaries on each iteration.

Performance-wise, this seems neutral. Instead of searching boundaries,
we now have to maintain ranges. But I think this is more readable
than the previous implementation.

Moreover, this makes easy to parallelize the main loop of ICF,
which I'll do in a follow-up patch.

llvm-svn: 288228
2016-11-30 01:50:03 +00:00
Rui Ueyama 84e65a7ca1 Use StringRefZ explicitly instead of const char *.
This patch is to avoid an implicit conversion from const char * to
StringRefZ, to make it apparent where we are using StringRefZ.

llvm-svn: 288182
2016-11-29 19:11:39 +00:00
Rui Ueyama a13efc2a73 Introduce StringRefZ class to represent null-terminated strings.
StringRefZ is a class to represent a null-terminated string. String
length is computed lazily, so it's more efficient than StringRef to
represent strings in string table.

The motivation of defining this new class is to merge functions
that only differ in string types; we have many constructors that takes
`const char *` or `StringRef`. With StringRefZ, we can merge them.

Differential Revision: https://reviews.llvm.org/D27037

llvm-svn: 288172
2016-11-29 18:05:04 +00:00
Peter Smith de3e73880e [ELF] Add support for static TLS to ARM
The module index dynamic relocation R_ARM_DTPMOD32 is always 1 for an
executable. When static linking and when we know that we are not a shared
object we can resolve the module index relocation statically.
    
The logic in handleNoRelaxTlsRelocation remains the same for Mips as it
has its own custom GOT writing code. For ARM we add the module index
relocation to the GOT when it can be resolved statically.
    
In addition the type of the RelExpr for the static resolution of TlsGotRel
should be R_TLS and not R_ABS as we need to include the size of
the thread control block in the calculation.
    
Addresses the TLS part of PR30218.

Differential revision: https://reviews.llvm.org/D27213

llvm-svn: 288153
2016-11-29 16:23:50 +00:00
George Rimar 9b3ae73fc8 [ELF] - Disable emiting multiple output sections when merging is disabled.
When -O0 is specified, we do not do section merging.
Though before this patch several sections were generated instead
of single, what is useless.

Differential revision: https://reviews.llvm.org/D27041

llvm-svn: 288151
2016-11-29 16:11:09 +00:00
George Rimar 3fb5a6dc9e [ELF] - Add support of proccessing of the rest allocatable synthetic sections from linkerscript.
This change continues what was started by D27040
Now all allocatable synthetics should be available from script side.

Differential revision: https://reviews.llvm.org/D27131

llvm-svn: 288150
2016-11-29 16:05:27 +00:00
Simon Atanasyan 160bf723f5 [ELF][MIPS] Make stable an order of GOT page address entries
llvm-svn: 288137
2016-11-29 13:26:04 +00:00
Simon Atanasyan 198dd205e6 [ELF][MIPS] Add new check to the test case in attempt to investigate Windows build-bot failure
llvm-svn: 288132
2016-11-29 11:19:47 +00:00
Simon Atanasyan 9705ff74ed [ELF][MIPS] Restore Config->Threads for MIPS targets
llvm-svn: 288130
2016-11-29 10:24:00 +00:00
Simon Atanasyan 9fae3b8a2c [ELF][MIPS] Do not change MipsGotSection state in the getPageEntryOffset method
The MipsGotSection::getPageEntryOffset calculates index of GOT entry
with a "page" address. Previously this method changes the state
of MipsGotSection because it modifies PageIndexMap field. That leads
to the unpredictable results if getPageEntryOffset called from multiple threads.

The patch makes getPageEntryOffset constant. To do so it calculates GOT
entry index but does not update PageIndexMap field. Later in the
MipsGotSection::writeTo method linker calculates "page" addresses and
writes them to the output.

llvm-svn: 288129
2016-11-29 10:23:56 +00:00
Simon Atanasyan a0efc4268c [ELF][MIPS] Replace the magic number of GOT header entries by constant. NFC
llvm-svn: 288128
2016-11-29 10:23:50 +00:00
Simon Atanasyan 80f3d9ce93 [ELF][MIPS] Fix calculation of GOT "page address" entries number
If output section which referenced by R_MIPS_GOT_PAGE or R_MIPS_GOT16
relocations is small (less that 0x10000 bytes) and occupies two adjacent
0xffff-bytes pages, current formula gives incorrect number of required "page"
GOT entries. The problem is that in time of calculation we do not know
the section address and so we cannot calculate number of 0xffff-bytes
pages exactly.

This patch fix the formula. Now it gives a correct number of pages in
the worst case when "small" section intersects 0xffff-bytes page
boundary. From the other side, sometimes it adds one more redundant GOT
entry for each output section. But usually number of output sections
referenced by GOT relocations is small.

llvm-svn: 288127
2016-11-29 10:23:46 +00:00
George Rimar 595a763f38 [ELF] - Implemented -N (-omagic) command line option.
-N (-omagic)
  Set the text and data sections to be readable and writable. 
  Also, do not page-align the data segment.

Differential revision: https://reviews.llvm.org/D26888

llvm-svn: 288123
2016-11-29 09:43:51 +00:00
Eugene Leviant 84569e6caa [ELF] Refactor target error messages
Differential revision: https://reviews.llvm.org/D27097

llvm-svn: 288114
2016-11-29 08:05:44 +00:00
Rui Ueyama bf4ddeb97c Style fix.
llvm-svn: 288113
2016-11-29 04:22:57 +00:00
Rui Ueyama c910bc7e19 Simplify "missing argument" error message.
llvm-svn: 288112
2016-11-29 04:17:31 +00:00
Rui Ueyama e5e407beb4 Add comments.
llvm-svn: 288111
2016-11-29 04:17:30 +00:00
Rui Ueyama bd2a812ff0 Print error message header in red.
llvm-svn: 288110
2016-11-29 04:09:08 +00:00
Rafael Espindola f1e245315b Use relocations to fill statically known got entries.
Right now we just remember a SymbolBody for each got entry and
duplicate a bit of logic to decide what value, if any, should be
written for that SymbolBody.

With ARM there will be more complicated values, and it seems better to
just use the relocation code to fill the got entries. This makes it
clear that each entry is filled by the dynamic linker or by the static
linker.

llvm-svn: 288107
2016-11-29 03:45:36 +00:00
Rafael Espindola d3b32df3de Sort. NFC.
llvm-svn: 288102
2016-11-29 03:36:30 +00:00
Rafael Espindola 5498ba38df Add a test.
It would have found a missing case in another patch.

llvm-svn: 288101
2016-11-29 03:30:07 +00:00
George Rimar 1642c5d871 [ELF] - Do not put non exec sections first when -no-rosegment
That unifies handling cases when we have SECTIONS and when
-no-rosegment is given in compareSectionsNonScript()

Now Config->SingleRoRx is used for check, testcase is provided.

llvm-svn: 288022
2016-11-28 10:26:21 +00:00
George Rimar 18a3096282 [ELF] - Set Config->SingleRoRx differently. NFC.
Previously Config->SingleRoRx was set in
createFiles() and used HasSections.

This change moves it to readConfigs at place of
common flags handling, and adds logic that sets
this flag separatelly from ScriptParser if SECTIONS present.

llvm-svn: 288021
2016-11-28 10:11:10 +00:00
George Rimar 63bf011003 [ELF] - Implemented -no-rosegment.
--no-rosegment: Do not put read-only non-executable sections in their own segment

Differential revision: https://reviews.llvm.org/D26889

llvm-svn: 288020
2016-11-28 10:05:20 +00:00
Eugene Leviant ed30ce7ae4 [ELF] Print file:line for 'undefined section' errors
Differential revision: https://reviews.llvm.org/D27108

llvm-svn: 288019
2016-11-28 09:58:04 +00:00
Rafael Espindola 8e67000f1a Always create a PT_ARM_EXIDX if needed.
Unfortunatelly PT_ARM_EXIDX is special. There is no way to create it
from linker scripts, so we have to create it even if PHDRS is used.

This matches bfd and is required for the lld output to survive bfd's strip.

llvm-svn: 288012
2016-11-28 00:40:21 +00:00
Rui Ueyama 1dd86a664f Add paralell_for and use it where appropriate.
When we iterate over numbers as opposed to iterable elements,
parallel_for fits better than parallel_for_each.

llvm-svn: 288002
2016-11-27 19:28:32 +00:00
Rafael Espindola 5fcc99c27d Also skip regular symbol assignment at the start of a script.
Unfortunatelly some scripts look like

kernphys = ...
. = ....

and the expectation in that every orphan section is after the
assignment.

llvm-svn: 287996
2016-11-27 09:44:45 +00:00
Rafael Espindola 7fe4ec9b3a Don't put an orphan before the first . assignment.
This is an horrible special case, but seems to match bfd's behaviour
and is important for avoiding placing an orphan section before the
expected start of the file.

llvm-svn: 287994
2016-11-27 07:39:45 +00:00
Rui Ueyama e8a077badf Change return types of split{Non,}Strings.
They return new vectors, but at the same time they mutate other vectors,
so returning values doesn't make much sense. We should just mutate two
vectors.

llvm-svn: 287979
2016-11-26 15:15:11 +00:00
Rui Ueyama 72b1ee2533 Make getColorDiagnostics return a boolean value instead of an enum.
Config->ColorDiagnostics was of type enum before. Now it is just a
boolean flag. Thanks Rafael for suggestion.

llvm-svn: 287978
2016-11-26 15:10:01 +00:00
Rui Ueyama 1880bbed39 Split MergeOutputSection::finalize.
llvm-svn: 287977
2016-11-26 15:09:58 +00:00
Rafael Espindola f93b8c29c8 Create sections with just assignments as STT_NOBITS.
This matches the behaviour of bfd ld. Using 0 was causing problems
with strip, which would remove these sections.

llvm-svn: 287969
2016-11-26 06:55:35 +00:00
Davide Italiano 3bfa081aa9 [ELF] Be compliant with LLVM and rename Lto into LTO. NFCI.
llvm-svn: 287967
2016-11-26 05:37:04 +00:00
Rui Ueyama d873e3a694 Fix buildbots.
llvm-svn: 287952
2016-11-25 20:42:39 +00:00
Rui Ueyama 1df9316922 Fix typo.
llvm-svn: 287951
2016-11-25 20:41:45 +00:00
Rui Ueyama c01321c6b8 Do not print out ARGV0 in white because it's unreadable on white background.
llvm-svn: 287950
2016-11-25 20:37:16 +00:00
Rui Ueyama 8c8818a58c Support -color-diagnostics={auto,always,never}.
-color-diagnostics=auto is default because that's the same as
Clang's default. When color is enabled, error or warning messages
are colored like this.

  error:
  <bold>ld.lld</bold> <red>error:</red> foo.o: no such file

  warning:
  <bold>ld.lld</bold> <magenta>warning:</magenta> foo.o: no such file

Differential Revision: https://reviews.llvm.org/D27117

llvm-svn: 287949
2016-11-25 20:27:32 +00:00
Rui Ueyama 6066641423 We shouldn't call parallle_for_each if -no-thread is given.
llvm-svn: 287948
2016-11-25 20:20:57 +00:00
Rui Ueyama 2555952ba8 Parallelize uncompress() and splitIntoPieces().
Uncompressing section contents and spliting mergeable section contents
into smaller chunks are heavy tasks. They scan entire section contents
and do CPU-intensive tasks such as uncompressing zlib-compressed data
or computing a hash value for each section piece.

Luckily, these tasks are independent to each other, so we can do that
in parallel_for_each. The number of input sections is large (as opposed
to the number of output sections), so there's a large parallelism here.

Actually the current design to call uncompress() and splitIntoPieces()
in batch was chosen with doing this in mind. Basically what we need to
do here is to replace `for` with `parallel_for_each`.

It seems this patch improves latency significantly if linked programs
contain debug info (which in turn contain lots of mergeable strings.)
For example, the latency to link Clang (debug build) improved by 20% on
my machine as shown below. Note that ld.gold took 19.2 seconds to do
the same thing.

Before:
    30801.782712 task-clock (msec)         #    3.652 CPUs utilized            ( +-  2.59% )
         104,084 context-switches          #    0.003 M/sec                    ( +-  1.02% )
           5,063 cpu-migrations            #    0.164 K/sec                    ( +- 13.66% )
       2,528,130 page-faults               #    0.082 M/sec                    ( +-  0.47% )
  85,317,809,130 cycles                    #    2.770 GHz                      ( +-  2.62% )
  67,352,463,373 stalled-cycles-frontend   #   78.94% frontend cycles idle     ( +-  3.06% )
 <not supported> stalled-cycles-backend
  44,295,945,493 instructions              #    0.52  insns per cycle
                                           #    1.52  stalled cycles per insn  ( +-  0.44% )
   8,572,384,877 branches                  #  278.308 M/sec                    ( +-  0.66% )
     141,806,726 branch-misses             #    1.65% of all branches          ( +-  0.13% )

     8.433424003 seconds time elapsed                                          ( +-  1.20% )

After:
    35523.764575 task-clock (msec)         #    5.265 CPUs utilized            ( +-  2.67% )
         159,107 context-switches          #    0.004 M/sec                    ( +-  0.48% )
           8,123 cpu-migrations            #    0.229 K/sec                    ( +- 23.34% )
       2,372,483 page-faults               #    0.067 M/sec                    ( +-  0.36% )
  98,395,342,152 cycles                    #    2.770 GHz                      ( +-  2.62% )
  79,294,670,125 stalled-cycles-frontend   #   80.59% frontend cycles idle     ( +-  3.03% )
 <not supported> stalled-cycles-backend
  46,274,151,813 instructions              #    0.47  insns per cycle
                                           #    1.71  stalled cycles per insn  ( +-  0.47% )
   8,987,621,670 branches                  #  253.003 M/sec                    ( +-  0.60% )
     148,900,624 branch-misses             #    1.66% of all branches          ( +-  0.27% )

     6.747548004 seconds time elapsed                                          ( +-  0.40% )

llvm-svn: 287946
2016-11-25 20:05:08 +00:00
Rui Ueyama 623b36e358 Move typedefs inside a class definition.
llvm-svn: 287945
2016-11-25 18:51:56 +00:00
Rui Ueyama 22375f2406 Remove a parameter from ScriptParser.
llvm-svn: 287944
2016-11-25 18:51:54 +00:00
Rui Ueyama da06bfb794 Move getLocation from Relocations.cpp to InputSection.cpp.
The function was used only within Relocations.cpp, but now we are
using it in many places, so this patch moves it to a file that fits
to the functionality.

llvm-svn: 287943
2016-11-25 18:51:53 +00:00
Eugene Leviant f04777527e [ELF] Add explicit template instantiations for toString
llvm-svn: 287938
2016-11-25 16:42:04 +00:00
Eugene Leviant ab024a353f [ELF] Refactor getDynRel to print error location
Differential revision: https://reviews.llvm.org/D27055

llvm-svn: 287915
2016-11-25 08:56:36 +00:00
Eugene Leviant c8c1b7bfae [ELF] EhOutputSection improvements
Differential revision: https://reviews.llvm.org/D27098

llvm-svn: 287914
2016-11-25 08:27:15 +00:00
George Rimar 11992c86d9 [ELF] - Add support for access to most of synthetic sections from linkerscript.
This is important for cases like:

  .sdata        : {
    *(.got.plt .got)
...
  }

That was not supported before as there was no way to get access to 
synthetic sections from script.

More details on review page.

Differential revision: https://reviews.llvm.org/D27040

llvm-svn: 287913
2016-11-25 08:05:41 +00:00
Rui Ueyama 26081caf48 Use toString() to report incompatible files.
llvm-svn: 287901
2016-11-24 20:59:44 +00:00
Rui Ueyama d4c94d1899 Include a hint how to see all errors if error is truncated.
This patch changes the error message from

  too many errors emitted, stopping now

to

  too many errors emitted, stopping now (use -error-limit=0 to see all errors)

Thanks for Sean for the suggestion!

llvm-svn: 287900
2016-11-24 20:31:43 +00:00
Rui Ueyama a3ac17372b Define toString(const SymbolBody &) and remove maybeDemangle instead.
Differential Revision: https://reviews.llvm.org/D27065

llvm-svn: 287899
2016-11-24 20:24:18 +00:00
Rafael Espindola 4862ae8cc5 Use a more explicit type for the sizeof.
llvm-svn: 287895
2016-11-24 16:38:35 +00:00
Peter Smith 719eb8efa5 [ELF] Add terminating sentinel .ARM.exidx table entry
The .ARM.exidx table has an entry for each function with the first entry
giving the start address of the function, the table is sorted in ascending
order of function address. Given a PC value, the unwinder will search the
table for the entry that contains the PC value.
    
If the table entry happens to be the last, the range of the addresses that
the final unwinding table describes will extend to the end of the address
space. To prevent an incorrect address outside the address range of the
program matching the last entry we follow ld.bfd's example and add a
sentinel EXIDX_CANTUNWIND entry at the end of the table. This gives the
final real table entry an upper bound.
    
In addition the llvm libunwind unwinder currently depends on the presence
of a sentinel entry (PR31091).

Differential revision: https://reviews.llvm.org/D26977

llvm-svn: 287869
2016-11-24 11:43:55 +00:00
George Rimar 066bf6e1b3 [ELF] - Removed unused method. NFC.
llvm-svn: 287860
2016-11-24 09:42:10 +00:00
Rui Ueyama 3aaacb282f Update comment.
llvm-svn: 287850
2016-11-24 01:44:21 +00:00
Rui Ueyama f373dd76ce Remove HasError and use ErrorCount instead.
HasError was always true if ErrorCount > 0, so we can use ErrorCount instead.

llvm-svn: 287849
2016-11-24 01:43:21 +00:00
Rui Ueyama bf9523f549 [COFF] Add DebugInfoCodeView dependency
rL287555 introduces a link error when building with BUILD_SHARED_LIBS:

  undefined reference to llvm::codeview::CVSymbolDumper::dump(),
  and more...

The functions are available in libDebugInfoCodeView, from LLVM.

Patch by Visoiu Mistrih Francis!

llvm-svn: 287837
2016-11-23 22:58:25 +00:00
Rui Ueyama 2eda6d1633 Set default entry point to .text if no entry point is found.
Previously, if a symbol specified by -e or ENTRY() is not found,
we didn't set entry point address. That is incompatible with GNU
because GNU linkers set the first address of .text to entry.
This patch implement that behavior.

llvm-svn: 287836
2016-11-23 22:41:00 +00:00
Simon Atanasyan 8469b8841c [ELF][MIPS] Fix handling of _gp/_gp_disp/__gnu_local_gp symbols
Offset between beginning of a .got section and _gp symbols used in MIPS
GOT relocations calculations. Usually the expression looks like
VA + Offset - GP, where VA is the .got section address, Offset - offset
of the GOT entry, GP - offset between .got and _gp. Also there two "magic"
symbols _gp_disp and __gnu_local_gp which hold the offset mentioned above.
These symbols might be referenced by MIPS relocations.

Now the linker always defines _gp symbol and uses hardcoded value for
its initialization. So offset between .got and _gp is 0x7ff0. The _gp_disp
and __gnu_local_gp defined if required and initialized by 0x7ff0.
In fact that is not correct because _gp symbol might be defined by a linker
script and holds arbitrary value. In that case we need to use this value
in relocation calculation and initialize _gp_disp and __gnu_local_gp
properly.

The patch fixes the problem and completes fixing the bug #30311.
https://llvm.org/bugs/show_bug.cgi?id=30311

Differential revision: https://reviews.llvm.org/D27036

llvm-svn: 287832
2016-11-23 22:22:16 +00:00