Commit Graph

225 Commits

Author SHA1 Message Date
Nico Weber c1a0e6fe6b llvm-undname: More no-op changes to increase test coverage
- Add test coverage around invalid anon namespaces and
  for error paths in demanglePrimitiveType() and in
  demangleFullyQualifiedTypeName()

- Use DEMANGLE_UNREACHABLE in two more unreachable places

llvm-svn: 362514
2019-06-04 15:38:00 +00:00
Nico Weber 880d21d3cb llvm-undname: Several behavior-preserving changes to increase coverage
- Replace `Error = true` in a few branches that are truly unreachable
  with DEMANGLE_UNREACHABLE

- Remove early return early in startsWithLocalScopePattern() because
  it's redundant with the next two early returns

- Remove unreachable `case '0'` (it's handled in the branch below)

- Remove an unused bool return

- Add test coverage for several early error returns, mostly in
  array type parsing

llvm-svn: 362506
2019-06-04 15:13:30 +00:00
Nico Weber 54362477c7 llvm-undname; Add more test coverage for demangleFunctionClass()
Also add two FC_Far that seem to be missing, by symmetry from
the public and protected cases. (But FC_Far isn't really a thing
anymore, so this doesn't really have an observable effect.)

llvm-svn: 362344
2019-06-02 23:26:57 +00:00
Nico Weber b5cd6163f4 Remove code path that's dead after r358835
llvm-svn: 362333
2019-06-02 17:41:07 +00:00
Nico Weber a2ca6e7803 llvm-undname: Support demangling char8_t
Ports clang's mangling support added in r354633 to llvm-undname.

llvm-svn: 361839
2019-05-28 15:30:04 +00:00
Nico Weber 88ab281b4d llvm-undname: Add support for local static thread guards
llvm-svn: 361835
2019-05-28 14:54:49 +00:00
Nico Weber f83c39e53f llvm-undname: Remove unreachable statement
llvm-svn: 361786
2019-05-28 01:20:36 +00:00
Nico Weber 82dc06c340 llvm-undname: Extract demangleMD5Name() method; no behavior change
llvm-svn: 361783
2019-05-27 23:10:42 +00:00
Nico Weber cfe08bc7d6 llvm-undname: Make demangling of MD5 names more robust
Demangler::parse() for MD5 names would:

1. Put all remaining text into the MD5 name sight unseen
2. Not modify MangledName

This meant that if the demangler recursively called parse() (e.g. in
demangleLocallyScopedNamePiece()), every recursive call that started on
an MD5 name would add all remaining bytes to the output buffer but
only advance the input by a byte.  For valid inputs, MD5 types are
never (well, see comments for 2 exceptions) nested, but for invalid
input this could cause memory use quadratic in the input size.

llvm-svn: 361744
2019-05-27 00:48:59 +00:00
Nico Weber 09fb2029e5 llvm-undname: Fix an assert-on-invalid, found by oss-fuzz
If a template parameter refers to a pointer to member, but the mangling
of that was a string literal instead of a real symbol, llvm-undname used
to crash instead of rejecting the input.

llvm-svn: 361402
2019-05-22 15:53:23 +00:00
Nico Weber 8d05eb8556 llvm-undname: Fix assert-on->4GiB-string-literal, found by oss-fuzz
llvm-svn: 359109
2019-04-24 16:09:38 +00:00
Nico Weber e8f21b1a6b llvm-undname: Support demangling the spaceship operator
Also add a test for demanling the co_await operator.

llvm-svn: 359007
2019-04-23 16:20:27 +00:00
Nico Weber f5c7f3ad33 llvm-undname: Fix an assert-on-invalid, found by oss-fuzz
llvm-svn: 358891
2019-04-22 15:05:18 +00:00
Nico Weber ce67a41741 llvm-undname: Fix hex escapes in wchar_t, char16_t, char32_t strings
llvm-undname used to put '\x' in front of every pair of nibbles, but
u"\xD7\xFF" produces a string with 6 bytes: \xD7 \0 \xFF \0 (and \0\0). Correct
for a single character (plus terminating \0) is u\xD7FF instead.
Now, wchar_t, char16_t, and char32_t strings roundtrip from source to
clang-cl (and cl.exe) and then llvm-undname.

(...at least as long as it's not a string like L"\xD7FF" L"foo" which
gets demangled as L"\xD7FFfoo", where the compiler then considers the
"f" as part of the hex escape. That seems ok.)

Also add a comment saying that the "almost-valid" char32_t string I
added in my last commit is actually produced by compilers.

llvm-svn: 358857
2019-04-21 17:19:27 +00:00
Nico Weber 8fc9902bbb llvm-undname: Fix stack overflow on almost-valid
If a unsigned with all 4 bytes non-0 was passed to outputHex(), there
were two off-by-ones in it:

- Both MaxPos and Pos left space for the final \0, which left the buffer
  one byte to small. Set MaxPos to 16 instead of 15 to fix.

- The `assert(Pos >= 0);` was after a `Pos--`, move it up one line.

Since valid Unicode codepoints are <= 0x10ffff, this could never really
happen in practice.

Found by oss-fuzz.

llvm-svn: 358856
2019-04-21 16:58:25 +00:00
Nico Weber aa162682ca llvm-undname: Fix stack overflow on invalid found by oss-fuzz
llvm-svn: 358852
2019-04-21 14:25:07 +00:00
Nico Weber 8eeaf5178d llvm-undname: Improve string literal demangling with embedded \0 chars
- Don't assert when a string looks like a u32 string to the heuristic
  but doesn't have a length that's 0 mod 4.  Instead, classify those
  as u16 with embedded \0 chars. Found by oss-fuzz.
- Print embedded nul bytes as \0 instead of \x00.

llvm-svn: 358835
2019-04-20 23:59:06 +00:00
Nico Weber e145a540cc llvm-undname: Attempt to fix leak-on-invalid found by oss-fuzz
llvm-svn: 358760
2019-04-19 14:13:11 +00:00
Nico Weber a0ac65c98f llvm-undname: Fix two more asserts-on-invalid, found by oss-fuzz
llvm-svn: 358708
2019-04-18 19:52:32 +00:00
Nico Weber 502cf4bd19 llvm-undname: Fix two asserts-on-invalid
llvm-svn: 358707
2019-04-18 19:30:21 +00:00
Nico Weber 930994ce14 llvm-undname: Consistently use "return nullptr" in functions returning pointers
llvm-svn: 358492
2019-04-16 14:24:42 +00:00
Nico Weber c035c243da llvm-undname: Fix nullptr deref on invalid structor names in template args
Similar to r358421: A StructorIndentifierNode has a Class field which
is read when printing it, but if the StructorIndentifierNode appears in
a template argument then demangleFullyQualifiedSymbolName() which sets
Class isn't called. Since StructorIndentifierNodes are always leaf
names, we can just reject them as well.

Found by oss-fuzz.

llvm-svn: 358491
2019-04-16 14:10:34 +00:00
Nico Weber 64041d7b90 llvm-undname: Fix nullptr deref on invalid conversion operator names in template args
A ConversionOperatorIdentifierNode has a TargetType which is read when
printing it, but if the ConversionOperatorIdentifierNode appears in a
template argument there's nothing that can provide the TargetType.
Normally the COIN is a symbol (leaf) name and takes its TargetType from the
symbol's type, but in a template argument context the COIN can only be
either a non-leaf name piece or a type, and must hence be invalid.

Similar to the COIN check in demangleDeclarator().

Found by oss-fuzz.

llvm-svn: 358421
2019-04-15 16:42:44 +00:00
Nico Weber ae050d214b llvm-undname: Fix oss-fuzz-foudn crash-on-invalid with incomplete special table nodes
llvm-svn: 358367
2019-04-14 23:32:37 +00:00
Nico Weber 63fe2593ae llvm-undname: Fix another crash-on-invalid found by oss-fuzz
llvm-svn: 358363
2019-04-14 23:08:12 +00:00
Nico Weber ef035186db llvm-undname: Use UNREACHABLE after exhaustive switch returning everywhere
No behavior change.

llvm-svn: 358241
2019-04-11 23:23:00 +00:00
Nico Weber af2ee7d0de llvm-undname: Name a bool param, no behavior change
llvm-svn: 358240
2019-04-11 23:20:18 +00:00
Nico Weber 03db625c13 llvm-undname: Fix out-of-bounds read on invalid intrinsic function code
Found by inspection.

llvm-svn: 358239
2019-04-11 23:11:33 +00:00
Nico Weber e5b62654a5 llvm-undname: Don't crash on incomplete enum tag manglings
Found by inspection.

llvm-svn: 358238
2019-04-11 22:59:25 +00:00
Nico Weber b4f33bbbb0 llvm-undname: Fix crash on incomplete virtual this adjusts
Found by oss-fuzz.

Also remove an else-after-return, this part has no behavior change.

llvm-svn: 358237
2019-04-11 22:47:18 +00:00
Nico Weber f2d8f09d5d llvm-undname: Fix crash on invalid name in a template parameter pointer to member arg
Found by oss-fuzz.

llvm-svn: 358234
2019-04-11 22:23:35 +00:00
Nico Weber 5f6eb1817a llvm-undname: Fix another crash-on-invalid
This fixes a regression from https://reviews.llvm.org/D60354. We used to

  SymbolNode *Symbol = demangleEncodedSymbol(MangledName, QN);
  if (Symbol) {
    Symbol->Name = QN;
  }

but changed that to
  SymbolNode *Symbol = demangleEncodedSymbol(MangledName, QN);
  if (Error)
    return nullptr;
  Symbol->Name = QN;

and one branch somewhere returned a nullptr without setting Error.

Looking at the code changed in r340083 and r340710 that branch looks
like a remnant from an earlier attempt to demangle RTTI descriptors
that has since been rewritten -- so just remove this branch. It
shouldn't change behavior for correctly mangled symbols.

llvm-svn: 358112
2019-04-10 17:31:34 +00:00
Nico Weber 63b97d2a67 llvm-undname: Fix more crashes and asserts on invalid inputs
For functions whose callers don't check that enough input is present,
add checks at the start of the function that enough input is there and
set Error otherwise.

For functions that return AST objects, return nullptr instead of
incomplete AST objects with nullptr fields if an error occurred during
the function.

Introduce a new function demangleDeclarator() for the sequence
demangleFullyQualifiedSymbolName(); demangleEncodedSymbol() and
use it in the two places that had this sequence. Let this new function
check that ConversionOperatorIdentifiers have a valid TargetType.

Some of the bad inputs found by oss-fuzz, others by inspection.

Differential Revision: https://reviews.llvm.org/D60354

llvm-svn: 357936
2019-04-08 19:46:53 +00:00
Nico Weber c5615c2326 llvm-undname: Name a pair. No behavior change.
Differential Revision: https://reviews.llvm.org/D60210

llvm-svn: 357653
2019-04-03 23:29:05 +00:00
Nico Weber 1672581e96 llvm-undname: Fix a crash-on-invalid
Found by oss-fuzz, fixes issue 13260 on oss-fuzz.

Differential Revision: https://reviews.llvm.org/D60207

llvm-svn: 357649
2019-04-03 23:27:18 +00:00
Nico Weber a9886f8278 llvm-undame: Fix an assert-on-invalid
Found by oss-fuzz, fixes issue 12432 on os-fuzz.

Differential Revision: https://reviews.llvm.org/D60206

llvm-svn: 357648
2019-04-03 23:23:32 +00:00
Nico Weber 321de48a94 llvm-undname: Fix an assert-on-invalid
Found by oss-fuzz, fixes issues 12428 and 12429 on oss-fuzz.

Differential Revision: https://reviews.llvm.org/D60204

llvm-svn: 357647
2019-04-03 23:19:39 +00:00
Nico Weber c7444ddfe5 llvm-undname: Fix a crash-on-invalid
Found by oss-fuzz, fixes issues 12435 and 12438 on oss-fuzz.

Differential Revision: https://reviews.llvm.org/D60202

llvm-svn: 357646
2019-04-03 23:15:56 +00:00
Konstantin Zhuravlyov 8456cddedd Add missing include (cstdlib) to Demangle.h
Differential Revision: https://reviews.llvm.org/D57035

llvm-svn: 351861
2019-01-22 19:18:18 +00:00
Chandler Carruth 57b08b0944 Update more file headers across all of the LLVM projects in the monorepo
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351648
2019-01-19 10:56:40 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
James Henderson f5356944a1 Add __[_[_]]Z demangling to new common demangle function
This is a follow-up to r351448. It adds support for other _*Z extensions
of the Itanium demanling, to the newly available demangle function
heuristic.

Reviewed by: erik.pilkington, rupprecht, grimar

Differential Revision: https://reviews.llvm.org/D56855

llvm-svn: 351551
2019-01-18 13:58:41 +00:00
Erik Pilkington 5094e5ef8b NFC: Make the copies of the demangler byte-for-byte identical
With this patch, the copies of the files ItaniumDemangle.h,
StringView.h, and Utility.h are kept byte-for-byte in sync between
libcxxabi and llvm. All differences (namespaces, fallthrough, and
unreachable macros) are defined in each copies' DemanglerConfig.h.

This patch also adds a script to copy changes from libcxxabi
(cp-to-llvm.sh), and a README.txt explaining the situation.

Differential revision: https://reviews.llvm.org/D53538

llvm-svn: 351474
2019-01-17 20:37:51 +00:00
James Henderson ce5b5b486a Move demangling function from llvm-objdump to Demangle library
This allows it to be used in an upcoming llvm-readobj change.

A small change in internal behaviour of the function is to always call
the microsoftDemangle function if the string does not have an itanium
encoding prefix, rather than only if it starts with '?'. This is
harmless because the microsoftDemangle function does the same check
already.

Reviewed by: grimar, erik.pilkington

Differential Revision: https://reviews.llvm.org/D56721

llvm-svn: 351448
2019-01-17 15:18:44 +00:00
Zachary Turner 2fe4900525 [llvm-undname] Add support for demangling msvc's noexcept types.
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions.  This patch
makes llvm-undname properly demangle them.

Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769

llvm-svn: 350656
2019-01-08 21:05:51 +00:00
Zachary Turner ba797b6dae [MS Demangler] Add a flag for dumping types without tag specifier.
Sometimes it's useful to be able to output demangled names without
tag specifiers like "struct", "class", etc.  This patch adds a
flag enabling this.

llvm-svn: 350241
2019-01-02 18:33:12 +00:00
Zachary Turner 1b9a938b9a Add missing include file.
llvm-svn: 349363
2018-12-17 16:42:26 +00:00
Zachary Turner b472512a77 [MS Demangler] Add a helper function to print a Node as a string.
llvm-svn: 349359
2018-12-17 16:14:50 +00:00
Zachary Turner 8fb9a71dde [MS Demangler] Fail gracefully on invalid pointer types.
Once we detect a 'P', we know we a pointer type is upcoming, so
we make some assumptions about the output that follows.  If those
assumptions didn't hold, we would assert.  Instead, we should
fail gracefully and propagate the error up.

llvm-svn: 349169
2018-12-14 18:10:13 +00:00
Zachary Turner 2cd3286ed2 Fix a crash in llvm-undname with invalid types.
llvm-svn: 349165
2018-12-14 17:43:56 +00:00
Pavel Labath 14f3e3aa36 [Demangle] remove itaniumFindTypesInMangledName
Summary:
This (very specialized) function was added to enable an LLDB use case.
Now that a more generic interface (overriding of parser functions -
D52992)  is available, and LLDB has been converted to use that (D54074),
the function is unused and can be removed.

Reviewers: erik.pilkington, sgraenitz, rsmith

Subscribers: mgorny, hiraditya, christof, libcxx-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D54893

llvm-svn: 347670
2018-11-27 16:11:24 +00:00
Nico Weber a92b463955 [MS Demangler] Print public:, protected:, private: if set in FunctionClass or a variable's StorageClass.
undname prints them, and the information is in the decorated name, so we probably shouldn't lose it when undecorating.

I spot-checked a few of the funnier-looking outputs, and undname has the same output.

Differential Revision: https://reviews.llvm.org/D54396

llvm-svn: 346791
2018-11-13 20:18:26 +00:00
Nico Weber 6808bc0f45 Make initializeOutputStream() return false on error and true on success.
As discussed in https://reviews.llvm.org/D52104

Differential Revision: https://reviews.llvm.org/D52143

llvm-svn: 346606
2018-11-11 10:04:00 +00:00
Nico Weber dfc08baceb [MS demangler] Use a slightly shorter unmangling for mangled strings.
Before: const wchar_t * {L"%"}
Now: L"%"

See also PR39593.
Differential Revision: https://reviews.llvm.org/D54294

llvm-svn: 346544
2018-11-09 19:28:50 +00:00
Reid Kleckner 4dc0b1ac60 Fix clang -Wimplicit-fallthrough warnings across llvm, NFC
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
   of only 'break'.

We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
   doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
   the outer case.

I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.

Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu

Differential Revision: https://reviews.llvm.org/D53950

llvm-svn: 345882
2018-11-01 19:54:45 +00:00
Zachary Turner 7ba905635f [MS Demangler] Expose the Demangler AST publicly.
LLDB would like to use this in order to build a clang AST from
a mangled name.

This is NFC otherwise.

llvm-svn: 345837
2018-11-01 15:07:32 +00:00
Pavel Labath f4c1582476 Port libcxxabi r344607 into llvm
Summary:
The original commit message was:
    This uses CRTP (for performance reasons) to allow a user the override
    demangler functions to implement custom parsing logic. The motivation
    for this is LLDB, which needs to occasionaly modify the mangled names.
    One such instance is already implemented via the TypeCallback member,
    but this is very specific functionality which does not help with any
    other use case. Currently we have a use case for modifying the
    constructor flavours, which would require adding another callback. This
    approach does not scale.

    With CRTP, the user (LLDB) can override any function it needs without
    any special support from the demangler library. After LLDB is ported to
    use this instead of the TypeCallback mechanism, the callback can be
    removed.

The only difference here is the addition of a unit test which exercises
the CRTP mechanism to override a function in the parser.

Reviewers: erik.pilkington, rsmith, EricWF

Subscribers: mgorny, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D53300

llvm-svn: 344703
2018-10-17 18:50:25 +00:00
Erik Pilkington fbca8d5495 NFC: Fix a -Wsign-conversion warning
llvm-svn: 344564
2018-10-15 22:03:53 +00:00
Benjamin Kramer c55e997556 Move some helpers from the global namespace into anonymous ones.
llvm-svn: 344468
2018-10-13 22:18:22 +00:00
Nico Weber 1359d654e3 Update microsoftDemangle() to work more like itaniumDemangle().
* Use same method of initializing the output stream and its buffer
* Allow a nullptr Status pointer
* Don't print the mangled name on demangling error
* Write to N (if it is non-nullptr)

Differential Revision: https://reviews.llvm.org/D52104

llvm-svn: 342330
2018-09-15 18:24:20 +00:00
Zachary Turner a1f57030c6 Remove some debugging code that was accidentally left in.
llvm-svn: 341122
2018-08-30 21:00:57 +00:00
Zachary Turner 78ab3cb238 [MS Demangler] Add support for $$Z parameter pack separator.
$$Z appears between adjacent expanded parameter packs in the
same template instantiation.  We don't need to print it, it's
only there to disambiguate between manglings that would otherwise
be ambiguous.  So we just need to parse it and throw it away.

llvm-svn: 341119
2018-08-30 20:53:29 +00:00
Zachary Turner 32a8a2028c [MS Demangler] Fix several crashes and demangling bugs.
These bugs were found by writing a Python script which spidered
the entire Chromium build directory tree demangling every symbol
in every object file.  At the start, the tool printed:

  Processed 27443 object files.
  2926377/2936108 symbols successfully demangled (99.6686%)
  9731 symbols could not be demangled (0.3314%)
  14589 files crashed while demangling (53.1611%)

After this patch, it prints:

  Processed 27443 object files.
  41295518/41295617 symbols successfully demangled (99.9998%)
  99 symbols could not be demangled (0.0002%)
  0 files crashed while demangling (0.0000%)

The issues fixed in this patch are:

  * Ignore empty parameter packs.  Previously we would encounter
    a mangling for an empty parameter pack and add a null node
    to the AST.  Since we don't print these anyway, we now just
    don't add anything to the AST and ignore it entirely.  This
    fixes some of the crashes.

  * Account for "incorrect" string literal demanglings.  Apparently
    an older version of clang would not truncate mangled string
    literals to 32 bytes of encoded character data.  The demangling
    code however would allocate a 32 byte buffer thinking that it
    would not encounter more than this, and overrun the buffer.
    We now demangle up to 128 bytes of data, since the buggy
    clang would encode up to 32 *characters* of data.

  * Extended support for demangling init-fini stubs.  If you had
    something like
      struct Foo {
        static vector<string> S;
      };
    this would generate a dynamic atexit initializer *for the
    variable*.  We didn't handle this, but now we print something
    nice.  This is actually an improvement over undname, which will
    fail to demangle this at all.

  * Fixed one case of static this adjustment.  We weren't handling
    several thunk codes so we didn't recognize the mangling.  These
    are now handled.

  * Fixed a back-referencing problem.  Member pointer templates
    should have their components considered for back-referencing

The remaining 99 symbols which can't be demangled are all symbols
which are compiler-generated and undname can't demangle either.

llvm-svn: 341000
2018-08-29 23:56:09 +00:00
Zachary Turner b2fef1a0b0 Add support for various C++14 demanglings.
Mostly this includes <auto> and <decltype-auto> return values.
Additionally, this fixes a fairly obscure back-referencing bug
that was encountered in one of the C++14 tests, which is that
if you have something like Foo<&bar, &bar> then the `bar`
forms a backreference.

llvm-svn: 340896
2018-08-29 04:12:44 +00:00
Zachary Turner 38d2edd60d [MS Demangler] Add output flags to all function calls.
Previously we had a FunctionSigFlags, but it's more flexible
to just have one set of output flags that apply to the entire
process and just pipe the entire set of flags through the
output process.

This will be useful when we start allowing the user to customize
the outputting behavior.

llvm-svn: 340894
2018-08-29 03:59:17 +00:00
Chandler Carruth be4a54940e Fix this file to have the necessary standard library includes and use
the `std::` namespace. Should fix a number of build bots as well.

llvm-svn: 340721
2018-08-27 06:52:14 +00:00
Zachary Turner 03b6f5a5ea [MS Demangler] Add virtual destructor.
Silence -Wnon-virtual-dtor.

llvm-svn: 340711
2018-08-27 04:04:41 +00:00
Zachary Turner 0331286373 [MS Demangler] Re-write the Microsoft demangler.
This is a pretty large refactor / re-write of the Microsoft
demangler.  The previous one was a little hackish because it
evolved as I was learning about all the various edge cases,
exceptions, etc.  It didn't have a proper AST and so there was
lots of custom handling of things that should have been much
more clean.

Taking what was learned from that experience, it's now
re-written with a completely redesigned and much more sensible
AST.  It's probably still not perfect, but at least it's
comprehensible now to someone else who wants to come along
and make some modifications or read the code.

Incidentally, this fixed a couple of bugs, so I've enabled
the tests which now pass.

llvm-svn: 340710
2018-08-27 03:48:03 +00:00
Simon Pilgrim ef467acc2c Fix -Wunused-function warning. NFCI.
llvm-svn: 340687
2018-08-25 17:11:11 +00:00
Zachary Turner ee09170d25 [MS Demangler] Print template constructor args.
Previously if you had something like this:

template<typename T>
struct Foo {
  template<typename U>
  Foo(U);
};

Foo F(3.7);

this would mangle as ??$?0N@?$Foo@H@@QEAA@N@Z

and this would be demangled as:

undname:      __cdecl Foo<int>::Foo<int><double>(double)
llvm-undname: __cdecl Foo<int>::Foo<int>(double)

Note the lack of the constructor template parameter in our
demangling.

This patch makes it so we print the constructor argument list.

llvm-svn: 340356
2018-08-21 22:52:52 +00:00
Zachary Turner df4cd7cbf9 [MS Demangler] Fix a few more edge cases.
I found these by running llvm-undname over a couple hundred
megabytes of object files generated as part of building chromium.
The issues fixed in this patch are:

  1) decltype-auto return types.
  2) Indirect vtables (e.g. const A::`vftable'{for `B'})
  3) Pointers, references, and rvalue-references to member pointers.

I have exactly one remaining symbol out of a few hundred MB of object
files that produces a name we can't demangle, and it's related to
back-referencing.

llvm-svn: 340341
2018-08-21 21:23:49 +00:00
Zachary Turner c175310a09 [MS Demangler] Demangle special operator 'dynamic initializer'.
This is encoded as __E and should print something like
"dynamic initializer for 'Foo'(void)"

This also adds support for dynamic atexit destructor, which is
basically identical but encoded as __F with slightly different
description.

llvm-svn: 340239
2018-08-20 23:59:21 +00:00
Zachary Turner 0002dd467d [MS Demangler] Anonymous namespace hashes can be backreferenced.
Previously we were not remembering the key values of anonymous
namespaces, but we need to do this.

llvm-svn: 340238
2018-08-20 23:58:58 +00:00
Zachary Turner 91c98a858c [MS Demangler] Properly demangle anonymous namespaces.
llvm-svn: 340237
2018-08-20 23:58:35 +00:00
David Blaikie a25e206973 Add missing include (<functional> for std::ref)
llvm-svn: 340205
2018-08-20 20:02:29 +00:00
Richard Smith 8a57f2e012 Move Itanium demangler implementation into a header file and add visitation support.
Summary:
This transforms the Itanium demangler into a generic reusable library that can
be used to build, traverse, and transform Itanium mangled name trees.

This is in preparation for adding a canonicalizing demangler, which
cannot live in the Demangle library for layering reasons. In order to
keep the diffs simpler, this patch moves more code to the new header
than is strictly necessary: in particular, all of the printLeft /
printRight implementations can be moved to the implementation file.
(And indeed we could make them non-virtual now if we wished, and remove
the vptr from Node.)

All nodes are now included in the Kind enumeration, rather than omitting
some of the Expr nodes, and the three different floating-point literal
node types now have distinct Kind values.

As a proof of concept for the visitation / matching mechanism, this
patch implements a Node dumping facility on top of it, replacing the
prior mechanism that produced the pretty-printed output rather than a
tree dump. Sample dump output:

FunctionEncoding(
  NameType("int"),
  NameWithTemplateArgs(
    NestedName(
      NameWithTemplateArgs(
        NameType("A"),
        TemplateArgs(
          {NameType("B")})),
      NameType("f")),
    TemplateArgs(
      {NameType("int")})),
  {},
  <null>,
  QualConst, FunctionRefQual::FrefQualLValue)

As a next step, it would make sense to move the LLVM high-level interface to
the demangler (the itaniumDemangler function and ItaniumPartialDemangler class)
into the Support library, and implement them in terms of the Demangle library.
This would allow the libc++abi demangler implementation to be an identical copy
of the llvm Demangle library, and would allow the LLVM implementation to reuse
LLVM components such as llvm::BumpPtrAllocator, but we'll need to decide how to
coordinate that with the MS ABI demangler, so I'm not doing that in this patch.

No functionality change intended other than the behavior of dump().

Reviewers: erik.pilkington, zturner, chandlerc, dlj

Subscribers: aheejin, llvm-commits

Differential Revision: https://reviews.llvm.org/D50930

llvm-svn: 340203
2018-08-20 19:44:01 +00:00
Zachary Turner 66555a7bed [MS Demangler] Demangle member pointer template parameters.
llvm-svn: 340199
2018-08-20 19:15:35 +00:00
Zachary Turner d9e925fca4 [MS Demangler] Resolve backreferences eagerly, not lazily.
A while back I submitted a patch to resolve backreferences
lazily, thinking this that it was not always possible to know
in advance what type you were looking at until you had completed
a full pass over the input, and therefore it would be impossible
to resolve backreferences eagerly.

This was mistaken though, and turned out to be an unrelated
problem.  In fact, the reverse is true.  You *must* resolve
backreferences eagerly.  This is because certain types of nested
mangled symbols do not share a backreference context with their
parent symbol, and as such, if you try to resolve them lazily
their backreference context will have been lost by the time you
finish demangling the entire input.  On the other hand, resolving
them eagerly appears to always work, and enables us to port
many more tests over.

llvm-svn: 340126
2018-08-18 18:49:48 +00:00
Zachary Turner 4746aa7b8f [MS Demangler] Properly print all thunk types.
We were only printing the vtordisp thunk before as the previous
patch was more aimed at getting special operators working, one
of which was a thunk.  This patch gets all thunk types to print
properly, and adds a test for each one.

llvm-svn: 340088
2018-08-17 21:32:07 +00:00
Zachary Turner 469f076356 [MS Demangler] Demangle all remaining types of operators.
This demangles all remaining special operators including thunks,
RTTI Descriptors, and local static guard variables.

llvm-svn: 340083
2018-08-17 21:18:05 +00:00
Zachary Turner 3461bfaa9c [MS Demangler] Rework the way operators are demangled.
Previously, some of the code for actually parsing mangled
operator names was more like formatting code in nature,
and was interspersed with the demangling code which builds
the AST.  This means that by the time we got to the printing
code, we had lost all information about what type of operator
we had, and all we were left with was a string that we just
had to print.  However, not all operators are actually even
operators.  it's basically just a catch-all mangling for
"special names", and for some of the other types it helps
to know when we're actually doing the printing what it is.

This patch changes the way things work by introducing an
OperatorInfo structure and corresponding enumeration.  When
we demangle we store the enumeration value and demangled
components separately.  This gives more flexibility during
printing.

In doing so, some demanglings of special names which we didn't
previously support come out of this for free, so we now demangle
those.

A few are more complex and are better left for a followup patch
though.

An exhaustive test of every possible operator code is included,
with the ones that don't yet work commented out.

llvm-svn: 340046
2018-08-17 16:14:05 +00:00
Richard Smith a6c34887f7 Factor Node creation out of the demangler. No functionality change
intended.

llvm-svn: 339944
2018-08-16 21:40:57 +00:00
Zachary Turner af738f7277 Fix memory leak in demangling of string literals.
llvm-svn: 339909
2018-08-16 17:48:32 +00:00
Zachary Turner d78fe2f46d Fix -Wmicrosoft-goto warnings.
llvm-svn: 339894
2018-08-16 16:30:27 +00:00
Zachary Turner 970fdc3236 [MS Demangler] Demangle string literals.
When demangling string literals, Microsoft's undname
simply prints 'string'.  This patch implements string
literal demangling while doing a bit better than this
by decoding as much of the string as possible and
trying to faithfully reproduce the original string
literal definition.

This is a bit tricky because the different character
types char, char16_t, and char32_t are not uniquely
identified by the mangling, so we have to use a
heuristic to try to guess the character type.  But
it works pretty well, and many tests are added to
illustrate the behavior.

Differential Revision: https://reviews.llvm.org/D50806

llvm-svn: 339892
2018-08-16 16:17:36 +00:00
Zachary Turner 83313f8f54 [MS Demangler] Don't fail on MD5-mangled names.
When we have an MD5 mangled name, we shouldn't choke and say
that it's an invalid name.  Even though it's impossible to demangle,
we should just output the original name.

llvm-svn: 339891
2018-08-16 16:17:17 +00:00
Zachary Turner 2bbb23ba3b [MS Demangler] Fix some minor formatting bugs.
1) We print __restrict twice on member pointers.  This is fixed
   and relevant tests are re-enabled.

2) Several tests were disabled because of printing slightly
   different output than undname.  These were confirmed to be
   bugs in undname, so we just re-enable the tests.

3) The test for printing reference temporaries is re-enabled.  This
   is a clang mangling extension, so we have some flexibility with
   how we demangle it.  The output currently looks fine, so we just
   re-enable the test with no fixes.

llvm-svn: 339708
2018-08-14 18:54:28 +00:00
Erik Pilkington ac6a801cca [itanium demangler] Add llvm::itaniumFindTypesInMangledName()
This function calls a callback whenever a <type> is parsed.

This is necessary to implement FindAlternateFunctionManglings in LLDB, which
uses a similar hack in FastDemangle. Once that function has been updated to use
this version, FastDemangle can finally be removed.

Differential revision: https://reviews.llvm.org/D50586

llvm-svn: 339580
2018-08-13 16:37:47 +00:00
Zachary Turner 29ec67b62f [MS Demangler] Support extern "C" functions.
There are two cases we need to support with extern "C"
functions.  The first is the case of a '9' indicating that
the function has no prototype.  This occurs when we mangle
a symbol inside of an extern "C" function, but not the
function itself.

The second case is when we have an overloaded extern "C"
functions.  In this case we emit $$J0 to indicate this.
This patch adds support for both of these cases.

llvm-svn: 339471
2018-08-10 21:09:05 +00:00
Zachary Turner 073620bc3b [MS Demangler] Demangle cv qualifiers on template args.
Before we wouldn't properly demangle something like
Foo<const int>.  Template args have a special escape sequence
'$$C' that is optional, but if it is present contains
qualifiers.  So we need to check for this and only if it
present, demangle qualifiers before demangling the type.

With this fix, we re-enable some tests that were previously
marked FIXME.

llvm-svn: 339465
2018-08-10 19:57:36 +00:00
Zachary Turner a17721cf5d [MS Demangler] Properly demangle conversion operators.
These were completely broken before.  We need to handle
the 'B' operator tag.

llvm-svn: 339436
2018-08-10 15:04:56 +00:00
Zachary Turner dbefc6cd4e [MS Demangler] Fix several issues related to templates.
These were uncovered when porting the mangling tests in
ms-templates.cpp from clang/CodeGenCXX over to demangling
tests.  The main issues fixed here are surrounding integer
literal signed and unsignedness, empty array dimensions,
and pointer and reference non-type template parameters.

Differential Revision: https://reviews.llvm.org/D50512

llvm-svn: 339434
2018-08-10 14:31:04 +00:00
Zachary Turner d346cba91b [MS Demangler] Create a new backref context for template instantiations.
Template manglings use a fresh back-referencing context, so we
need to do the same.  This fixes several existing tests which are
marked as FIXME, so those are now actually run.

llvm-svn: 339275
2018-08-08 17:17:04 +00:00
Zachary Turner 58d29cf590 [MS Demangler] Properly handle backreferencing of special names.
Function template names are not stored in the backref table,
but non-template function names are.  The general pattern seems
to be that when you are demangling a symbol name, if the name
starts with '?' it does not go into the backreference table,
otherwise it does.  Note that this even handles the general case
of operator names (template or otherwise) not going into the
back-reference table, anonymous namespaces not going into the
backreference table, etc.

It's important that we apply this check *only* for the
unqualified portion of a name, and only for symbol names.
For example, this does not apply to type names (such as class
templates) and we need to make sure that these still do go
into the backref table.

Differential Revision: https://reviews.llvm.org/D50394

llvm-svn: 339211
2018-08-08 00:43:31 +00:00
Erik Pilkington 90dc82e955 [itanium demangler] Support dot suffixes on block invocation functions
rdar://32378759

llvm-svn: 338747
2018-08-02 17:45:01 +00:00
Zachary Turner ae67218989 Fix one more warning.
llvm-svn: 338742
2018-08-02 17:33:33 +00:00
Zachary Turner 5b0456d0ce Fix a couple of warnings.
llvm-svn: 338739
2018-08-02 17:18:01 +00:00
Zachary Turner 7563ebe391 Use %.*s instead of %*s when formatting strings with explicit length.
llvm-svn: 338737
2018-08-02 17:08:24 +00:00
Zachary Turner 172aea10fa [MS Demangler] Resolve back-references lazily.
We need to both record and resolve back-references lazily due to
not being able to know until a demangling is complete whether or
not a name should go into the back-reference table..  This patch
implements lazy resolution of back-references, but we still have
eager recording of back-references.  This will be fixed in a
subsequent patch.

llvm-svn: 338736
2018-08-02 17:08:03 +00:00
Zachary Turner 5ae08b858d Try to fix FreeBSD build.
It seems like perhaps because cstdio isn't directly included, the
compiler is accidentally picking up wprintf from somewhere else
and trying to call that.  Hopefully this fixes it.

llvm-svn: 338614
2018-08-01 18:44:12 +00:00