Commit Graph

96 Commits

Author SHA1 Message Date
Nico Weber dfc08baceb [MS demangler] Use a slightly shorter unmangling for mangled strings.
Before: const wchar_t * {L"%"}
Now: L"%"

See also PR39593.
Differential Revision: https://reviews.llvm.org/D54294

llvm-svn: 346544
2018-11-09 19:28:50 +00:00
Zachary Turner 78ab3cb238 [MS Demangler] Add support for $$Z parameter pack separator.
$$Z appears between adjacent expanded parameter packs in the
same template instantiation.  We don't need to print it, it's
only there to disambiguate between manglings that would otherwise
be ambiguous.  So we just need to parse it and throw it away.

llvm-svn: 341119
2018-08-30 20:53:29 +00:00
Zachary Turner 32a8a2028c [MS Demangler] Fix several crashes and demangling bugs.
These bugs were found by writing a Python script which spidered
the entire Chromium build directory tree demangling every symbol
in every object file.  At the start, the tool printed:

  Processed 27443 object files.
  2926377/2936108 symbols successfully demangled (99.6686%)
  9731 symbols could not be demangled (0.3314%)
  14589 files crashed while demangling (53.1611%)

After this patch, it prints:

  Processed 27443 object files.
  41295518/41295617 symbols successfully demangled (99.9998%)
  99 symbols could not be demangled (0.0002%)
  0 files crashed while demangling (0.0000%)

The issues fixed in this patch are:

  * Ignore empty parameter packs.  Previously we would encounter
    a mangling for an empty parameter pack and add a null node
    to the AST.  Since we don't print these anyway, we now just
    don't add anything to the AST and ignore it entirely.  This
    fixes some of the crashes.

  * Account for "incorrect" string literal demanglings.  Apparently
    an older version of clang would not truncate mangled string
    literals to 32 bytes of encoded character data.  The demangling
    code however would allocate a 32 byte buffer thinking that it
    would not encounter more than this, and overrun the buffer.
    We now demangle up to 128 bytes of data, since the buggy
    clang would encode up to 32 *characters* of data.

  * Extended support for demangling init-fini stubs.  If you had
    something like
      struct Foo {
        static vector<string> S;
      };
    this would generate a dynamic atexit initializer *for the
    variable*.  We didn't handle this, but now we print something
    nice.  This is actually an improvement over undname, which will
    fail to demangle this at all.

  * Fixed one case of static this adjustment.  We weren't handling
    several thunk codes so we didn't recognize the mangling.  These
    are now handled.

  * Fixed a back-referencing problem.  Member pointer templates
    should have their components considered for back-referencing

The remaining 99 symbols which can't be demangled are all symbols
which are compiler-generated and undname can't demangle either.

llvm-svn: 341000
2018-08-29 23:56:09 +00:00
Zachary Turner b2fef1a0b0 Add support for various C++14 demanglings.
Mostly this includes <auto> and <decltype-auto> return values.
Additionally, this fixes a fairly obscure back-referencing bug
that was encountered in one of the C++14 tests, which is that
if you have something like Foo<&bar, &bar> then the `bar`
forms a backreference.

llvm-svn: 340896
2018-08-29 04:12:44 +00:00
Zachary Turner 0331286373 [MS Demangler] Re-write the Microsoft demangler.
This is a pretty large refactor / re-write of the Microsoft
demangler.  The previous one was a little hackish because it
evolved as I was learning about all the various edge cases,
exceptions, etc.  It didn't have a proper AST and so there was
lots of custom handling of things that should have been much
more clean.

Taking what was learned from that experience, it's now
re-written with a completely redesigned and much more sensible
AST.  It's probably still not perfect, but at least it's
comprehensible now to someone else who wants to come along
and make some modifications or read the code.

Incidentally, this fixed a couple of bugs, so I've enabled
the tests which now pass.

llvm-svn: 340710
2018-08-27 03:48:03 +00:00
Zachary Turner ee09170d25 [MS Demangler] Print template constructor args.
Previously if you had something like this:

template<typename T>
struct Foo {
  template<typename U>
  Foo(U);
};

Foo F(3.7);

this would mangle as ??$?0N@?$Foo@H@@QEAA@N@Z

and this would be demangled as:

undname:      __cdecl Foo<int>::Foo<int><double>(double)
llvm-undname: __cdecl Foo<int>::Foo<int>(double)

Note the lack of the constructor template parameter in our
demangling.

This patch makes it so we print the constructor argument list.

llvm-svn: 340356
2018-08-21 22:52:52 +00:00
Zachary Turner df4cd7cbf9 [MS Demangler] Fix a few more edge cases.
I found these by running llvm-undname over a couple hundred
megabytes of object files generated as part of building chromium.
The issues fixed in this patch are:

  1) decltype-auto return types.
  2) Indirect vtables (e.g. const A::`vftable'{for `B'})
  3) Pointers, references, and rvalue-references to member pointers.

I have exactly one remaining symbol out of a few hundred MB of object
files that produces a name we can't demangle, and it's related to
back-referencing.

llvm-svn: 340341
2018-08-21 21:23:49 +00:00
Zachary Turner c175310a09 [MS Demangler] Demangle special operator 'dynamic initializer'.
This is encoded as __E and should print something like
"dynamic initializer for 'Foo'(void)"

This also adds support for dynamic atexit destructor, which is
basically identical but encoded as __F with slightly different
description.

llvm-svn: 340239
2018-08-20 23:59:21 +00:00
Zachary Turner 0002dd467d [MS Demangler] Anonymous namespace hashes can be backreferenced.
Previously we were not remembering the key values of anonymous
namespaces, but we need to do this.

llvm-svn: 340238
2018-08-20 23:58:58 +00:00
Zachary Turner 91c98a858c [MS Demangler] Properly demangle anonymous namespaces.
llvm-svn: 340237
2018-08-20 23:58:35 +00:00
Zachary Turner 66555a7bed [MS Demangler] Demangle member pointer template parameters.
llvm-svn: 340199
2018-08-20 19:15:35 +00:00
Zachary Turner d9e925fca4 [MS Demangler] Resolve backreferences eagerly, not lazily.
A while back I submitted a patch to resolve backreferences
lazily, thinking this that it was not always possible to know
in advance what type you were looking at until you had completed
a full pass over the input, and therefore it would be impossible
to resolve backreferences eagerly.

This was mistaken though, and turned out to be an unrelated
problem.  In fact, the reverse is true.  You *must* resolve
backreferences eagerly.  This is because certain types of nested
mangled symbols do not share a backreference context with their
parent symbol, and as such, if you try to resolve them lazily
their backreference context will have been lost by the time you
finish demangling the entire input.  On the other hand, resolving
them eagerly appears to always work, and enables us to port
many more tests over.

llvm-svn: 340126
2018-08-18 18:49:48 +00:00
Zachary Turner 4746aa7b8f [MS Demangler] Properly print all thunk types.
We were only printing the vtordisp thunk before as the previous
patch was more aimed at getting special operators working, one
of which was a thunk.  This patch gets all thunk types to print
properly, and adds a test for each one.

llvm-svn: 340088
2018-08-17 21:32:07 +00:00
Zachary Turner 469f076356 [MS Demangler] Demangle all remaining types of operators.
This demangles all remaining special operators including thunks,
RTTI Descriptors, and local static guard variables.

llvm-svn: 340083
2018-08-17 21:18:05 +00:00
Zachary Turner 3461bfaa9c [MS Demangler] Rework the way operators are demangled.
Previously, some of the code for actually parsing mangled
operator names was more like formatting code in nature,
and was interspersed with the demangling code which builds
the AST.  This means that by the time we got to the printing
code, we had lost all information about what type of operator
we had, and all we were left with was a string that we just
had to print.  However, not all operators are actually even
operators.  it's basically just a catch-all mangling for
"special names", and for some of the other types it helps
to know when we're actually doing the printing what it is.

This patch changes the way things work by introducing an
OperatorInfo structure and corresponding enumeration.  When
we demangle we store the enumeration value and demangled
components separately.  This gives more flexibility during
printing.

In doing so, some demanglings of special names which we didn't
previously support come out of this for free, so we now demangle
those.

A few are more complex and are better left for a followup patch
though.

An exhaustive test of every possible operator code is included,
with the ones that don't yet work commented out.

llvm-svn: 340046
2018-08-17 16:14:05 +00:00
Zachary Turner 970fdc3236 [MS Demangler] Demangle string literals.
When demangling string literals, Microsoft's undname
simply prints 'string'.  This patch implements string
literal demangling while doing a bit better than this
by decoding as much of the string as possible and
trying to faithfully reproduce the original string
literal definition.

This is a bit tricky because the different character
types char, char16_t, and char32_t are not uniquely
identified by the mangling, so we have to use a
heuristic to try to guess the character type.  But
it works pretty well, and many tests are added to
illustrate the behavior.

Differential Revision: https://reviews.llvm.org/D50806

llvm-svn: 339892
2018-08-16 16:17:36 +00:00
Zachary Turner 83313f8f54 [MS Demangler] Don't fail on MD5-mangled names.
When we have an MD5 mangled name, we shouldn't choke and say
that it's an invalid name.  Even though it's impossible to demangle,
we should just output the original name.

llvm-svn: 339891
2018-08-16 16:17:17 +00:00
Zachary Turner 2bbb23ba3b [MS Demangler] Fix some minor formatting bugs.
1) We print __restrict twice on member pointers.  This is fixed
   and relevant tests are re-enabled.

2) Several tests were disabled because of printing slightly
   different output than undname.  These were confirmed to be
   bugs in undname, so we just re-enable the tests.

3) The test for printing reference temporaries is re-enabled.  This
   is a clang mangling extension, so we have some flexibility with
   how we demangle it.  The output currently looks fine, so we just
   re-enable the test with no fixes.

llvm-svn: 339708
2018-08-14 18:54:28 +00:00
Zachary Turner 29ec67b62f [MS Demangler] Support extern "C" functions.
There are two cases we need to support with extern "C"
functions.  The first is the case of a '9' indicating that
the function has no prototype.  This occurs when we mangle
a symbol inside of an extern "C" function, but not the
function itself.

The second case is when we have an overloaded extern "C"
functions.  In this case we emit $$J0 to indicate this.
This patch adds support for both of these cases.

llvm-svn: 339471
2018-08-10 21:09:05 +00:00
Zachary Turner 909b819cf9 Resubmit r339450 - [MS Demangler] Add conversion operator tests
This was broken because of a malformed check line.  Incidentally,
this exposed a case where we crash when we should just be returning
an error, so we should fix that.  The demangler shouldn't crash due
to user input.

llvm-svn: 339466
2018-08-10 20:08:46 +00:00
Zachary Turner 073620bc3b [MS Demangler] Demangle cv qualifiers on template args.
Before we wouldn't properly demangle something like
Foo<const int>.  Template args have a special escape sequence
'$$C' that is optional, but if it is present contains
qualifiers.  So we need to check for this and only if it
present, demangle qualifiers before demangling the type.

With this fix, we re-enable some tests that were previously
marked FIXME.

llvm-svn: 339465
2018-08-10 19:57:36 +00:00
Sanjay Patel 8988b8d92c revert r339450 - [MS Demangler] Add conversion operator tests
Something here causes an assertion failure that killed a bunch of bots.
Example:
http://lab.llvm.org:8011/builders/reverse-iteration/builds/7021/steps/check_all/logs/stdio

llvm-svn: 339463
2018-08-10 19:20:16 +00:00
Zachary Turner d664117794 [MS Demangler] Add conversion operator tests.
The mangled names were added in the original commit, but
the demangled equivalents weren't, so nothing was actually
being checked.

llvm-svn: 339450
2018-08-10 16:55:59 +00:00
Zachary Turner a17721cf5d [MS Demangler] Properly demangle conversion operators.
These were completely broken before.  We need to handle
the 'B' operator tag.

llvm-svn: 339436
2018-08-10 15:04:56 +00:00
Zachary Turner e89f2fa657 [MS Demangler] Disable a couple of tests.
The check lines are marked FIXME but not the mangled names.
This is causing an error.

llvm-svn: 339435
2018-08-10 14:53:33 +00:00
Zachary Turner dbefc6cd4e [MS Demangler] Fix several issues related to templates.
These were uncovered when porting the mangling tests in
ms-templates.cpp from clang/CodeGenCXX over to demangling
tests.  The main issues fixed here are surrounding integer
literal signed and unsignedness, empty array dimensions,
and pointer and reference non-type template parameters.

Differential Revision: https://reviews.llvm.org/D50512

llvm-svn: 339434
2018-08-10 14:31:04 +00:00
Zachary Turner d346cba91b [MS Demangler] Create a new backref context for template instantiations.
Template manglings use a fresh back-referencing context, so we
need to do the same.  This fixes several existing tests which are
marked as FIXME, so those are now actually run.

llvm-svn: 339275
2018-08-08 17:17:04 +00:00
Zachary Turner 58d29cf590 [MS Demangler] Properly handle backreferencing of special names.
Function template names are not stored in the backref table,
but non-template function names are.  The general pattern seems
to be that when you are demangling a symbol name, if the name
starts with '?' it does not go into the backreference table,
otherwise it does.  Note that this even handles the general case
of operator names (template or otherwise) not going into the
back-reference table, anonymous namespaces not going into the
backreference table, etc.

It's important that we apply this check *only* for the
unqualified portion of a name, and only for symbol names.
For example, this does not apply to type names (such as class
templates) and we need to make sure that these still do go
into the backref table.

Differential Revision: https://reviews.llvm.org/D50394

llvm-svn: 339211
2018-08-08 00:43:31 +00:00
Zachary Turner 666de23fbf [MS Demangler] Fix some tests that are no longer broken.
These were fixed with earlier patches, but had not yet been
re-enabled.

llvm-svn: 338778
2018-08-02 22:37:40 +00:00
Zachary Turner 44ebbc216a [MS Demangler] Properly demangle templated operators.
After we detected the presence of a template via ?$ we would proceed by
only demangling a simple unqualified name. This means we would fail on
templated operators (and perhaps other yet-to-be-determined things)

This was discovered while doing some refactoring to store richer
semantic information about the demangled types to pave the way for
overhauling the way we handle backreferences. (Specifically, we need to
defer recording or resolving back-references until a symbol has been
completely demangled, because we need to use information that only
occurs later in the mangled string to decide whether a back-reference
should be recorded.)

Differential Revision: https://reviews.llvm.org/D50145

llvm-svn: 338608
2018-08-01 18:32:47 +00:00
Zachary Turner d30700f82d Resubmit r338340 "[MS Demangler] Better demangling of template arguments."
This broke the build with GCC, but has since been fixed.

llvm-svn: 338403
2018-07-31 17:16:44 +00:00
Reid Kleckner d2bad6c639 Revert r338340 "[MS Demangler] Better demangling of template arguments."
Breaks the build with GCC, apparently.

llvm-svn: 338344
2018-07-31 01:08:42 +00:00
Zachary Turner 4f85809a84 [MS Demangler] Better demangling of template arguments.
This patch fixes demangling of template aliases as template-template
arguments, and also fixes function pointers and references as
not type template parameters.  All of these can be properly
demangled now, so I've ported over the test
clang/test/CodeGenCXX/ms-template-callbacks.cpp.  All of these
tests pass

llvm-svn: 338340
2018-07-31 00:26:52 +00:00
Zachary Turner a51403f5cc [MS Demangler] Add ms-return-qualifiers.test.
This is a copy of the tests from
clang/CodeGenCXX/ms-return-qualifiers.cpp converted to demangling
tests.

llvm-svn: 338330
2018-07-30 23:22:39 +00:00
Zachary Turner 931e879cef [MS Demangler] Add rudimentary C++11 Support
This patch adds support for demangling r-value references, new
operators such as the ""_foo operator, lambdas, alias types,
nullptr_t, and various other C++11'isms.

There is 1 failing test remaining in this file, which appears to
be related to back-referencing. This type of problem has the
potential to get ugly so I'd rather fix it in a separate patch.

Differential Revision: https://reviews.llvm.org/D50013

llvm-svn: 338324
2018-07-30 23:02:10 +00:00
Zachary Turner 71c91f948a [MS Demangler] Demangle symbols in function scopes.
There are a couple of issues you run into when you start getting into
more complex names, especially with regards to function local statics.
When you've got something like:

    int x() {
      static int n = 0;
      return n;
    }

Then this needs to demangle to something like

    int `int __cdecl x()'::`1'::n

The nested mangled symbols (e.g. `int __cdecl x()` in the above
example) also share state with regards to back-referencing, so
we need to be able to re-use the demangler in the middle of
demangling a symbol while sharing back-ref state.

To make matters more complicated, there are a lot of ambiguities
when demangling a symbol's qualified name, because a function local
scope pattern (usually something like `?1??name?`) looks suspiciously
like many other possible things that can occur, such as `?1` meaning
the second back-ref and disambiguating these cases is rather
interesting.  The `?1?` in a local scope pattern is actually a special
case of the more general pattern of `? + <encoded number> + ?`, where
"encoded number" can itself have embedded `@` symbols, which is a
common delimeter in mangled names.  So we have to take care during the
disambiguation, which is the reason for the overly complicated
`isLocalScopePattern` function in this patch.

I've added some pretty obnoxious tests to exercise all of this, which
exposed several other problems related to back-referencing, so those
are fixed here as well. Finally, I've uncommented some tests that were
previously marked as `FIXME`, since now these work.

Differential Revision: https://reviews.llvm.org/D49965

llvm-svn: 338226
2018-07-30 03:12:34 +00:00
Zachary Turner 23df1319ca [MS Demangler] Properly handle function parameter back-refs.
Properly demangle function parameter back-references.

Previously we treated lists of function parameters and template
parameters the same. There are some important differences with regards
to back-references, and some less important differences regarding which
characters can appear before or after the name.

The important differences are that with a given type T, all instances of
a function parameter list share the same global back-ref table.
Specifically, if X and Y are function pointers, then there are 3
entities in the declaration X func(Y) which all affect and are affected
by the master parameter back-ref table:
  1) The parameter list of X's function type
  2) the parameter list of func itself
  3) The parameter list of Y's function type.

The previous code would create a back-reference table that was local to
a single parameter list, so it would not be shared across parameter
lists.

This was discovered when porting ms-back-references.test from clang's
mangling tests. All of these tests should now pass with the new changes.

In doing so, I split the function for parsing template and function
parameters into two separate functions. This makes the template
parameter list parsing code in particular very small and easy to
understand now.

Differential Revision: https://reviews.llvm.org/D49875

llvm-svn: 338075
2018-07-26 22:13:39 +00:00
Zachary Turner 024e1762aa [MS Demangler] Print calling convention inside parentheses.
For function pointers, we would print something like

int __cdecl (*)(int)

We need to move the calling convention inside, and print

int (__cdecl *)(int)

This patch implements this change for regular function pointers as
well as member function pointers.

llvm-svn: 338068
2018-07-26 20:33:48 +00:00
Zachary Turner ca7aef10c4 [MS Demangler] Add ms-arg-qualifiers.test
This converts the arg qualifier mangling tests from
clang/CodeGenCXX/mangle-ms-arg-qualifiers.cpp to demangling tests.
Most tests already pass, so this patch doesn't come with any
functional change, just the addition of new tests.  The few tests
that don't pass are left in with a FIXME label so that they don't
run but serve as documentation about what still doesn't work.

llvm-svn: 338067
2018-07-26 20:25:35 +00:00
Zachary Turner f4c4519532 Add missing tests from ms-mangle.cpp.
None of these tests pass yet so they are commented out, but I'm
adding them with a FIXME label so that they don't get lost when
copying tests over from clang's mangling tests.  Currently these
tests are all commented out.

llvm-svn: 338066
2018-07-26 20:20:29 +00:00
Zachary Turner 38b78a7f0e [MS Demangler] Demangle pointers to member functions.
After this patch, we can now properly demangle pointers to member
functions.  The calling convention is located in the wrong place,
but this will be fixed in a followup since it also affects non
member function pointers.

Differential Revision: https://reviews.llvm.org/D49639

llvm-svn: 338065
2018-07-26 20:20:10 +00:00
Zachary Turner d742d645a1 [MS Demangler] Demangle data member pointers.
Differential Revision: https://reviews.llvm.org/D49630

llvm-svn: 338061
2018-07-26 19:56:09 +00:00
Zachary Turner 91ecedd29e Fix a few warnings and style issues in MS demangler.
Also remove a broken test case.

llvm-svn: 337591
2018-07-20 18:07:33 +00:00
Zachary Turner f435a7eada Add a Microsoft Demangler.
This adds initial support for a demangling library (LLVMDemangle)
and tool (llvm-undname) for demangling Microsoft names.  This
doesn't cover 100% of cases and there are some known limitations
which I intend to address in followup patches, at least until such
time that we have (near) 100% test coverage matching up with all
of the test cases in clang/test/CodeGenCXX/mangle-ms-*.

Differential Revision: https://reviews.llvm.org/D49552

llvm-svn: 337584
2018-07-20 17:27:48 +00:00
Saleem Abdulrasool 7091820a96 llvm-cxxfilt: support reading from stdin
`c++filt` when given no arguments runs as a REPL, decoding each line as a
decorated name.  Unify the test structure to be more uniform, with the tests for
llvm-cxxfilt living under test/tools/llvm-cxxfilt.

llvm-svn: 286777
2016-11-13 20:43:38 +00:00
Rafael Espindola b940b66c60 Add an c++ itanium demangler to llvm.
This adds a copy of the demangler in libcxxabi.

The code also has no dependencies on anything else in LLVM. To enforce
that I added it as another library. That way a BUILD_SHARED_LIBS will
fail if anyone adds an use of StringRef for example.

The no llvm dependency combined with the fact that this has to build
on linux, OS X and Windows required a few changes to the code. In
particular:

    No constexpr.
    No alignas

On OS X at least this library has only one global symbol:
__ZN4llvm16itanium_demangleEPKcPcPmPi

My current plan is:

    Commit something like this
    Change lld to use it
    Change lldb to use it as the fallback

    Add a few #ifdefs so that exactly the same file can be used in
    libcxxabi to export abi::__cxa_demangle.

Once the fast demangler in lldb can handle any names this
implementation can be replaced with it and we will have the one true
demangler.

llvm-svn: 280732
2016-09-06 19:16:48 +00:00