Commit Graph

515 Commits

Author SHA1 Message Date
Manuel Klimek 836c2868f9 Add an option to not indent declarations when breaking after the type.
Make that option the default for LLVM style.

llvm-svn: 184563
2013-06-21 17:25:42 +00:00
Alexander Kornienko b93062e236 Use the same set of whitespace characters for all operations in BreakableToken.
Summary:
Fixes a problem where \t,\v or \f could lead to a crash when placed as
a first character in a line comment. The cause is that rtrim and ltrim handle
these characters, but our code didn't, so some invariants could be broken.

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1013

llvm-svn: 184425
2013-06-20 13:58:37 +00:00
Alexander Kornienko a3555e2416 Fixed long-standing issue with incorrect length calculation of multi-line comments.
Summary:
A trailing block comment having multiple lines would cause extremely
high penalties if the summary length of its lines is more than the column limit.
Fixed by always considering only the last line of a multi-line block comment.
Removed a long-standing FIXME from relevant tests and added a motivating test
modelled after problem cases from real code.

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1010

llvm-svn: 184340
2013-06-19 19:50:11 +00:00
Alexander Kornienko 7285207486 Split long strings on word boundaries.
Summary: Split strings at word boundaries, when there are no spaces and slashes.

Reviewers: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1003

llvm-svn: 184304
2013-06-19 14:22:47 +00:00
Alexander Kornienko afaa8f556b Fix a problem in ExpressionParser leading to trailing comments affecting indentation of an expression after a line break.
Summary:
E.g. the second line in 

return aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +
           b; //

is indented 4 characters more than in

return aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +
       b;

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D984

llvm-svn: 184078
2013-06-17 13:19:53 +00:00
Alexander Kornienko 4d26b6efef Fixes incorrect indentation of line comments after break and re-alignment.
Summary:
Selectively propagate the information about token kind in
WhitespaceManager::replaceWhitespaceInToken.For correct alignment of new
segments of line comments in order to align them correctly. Don't set
BreakBeforeParameter in breakProtrudingToken for line comments, as it introduces
a break after the _next_ parameter. Added tests for related functions.

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D980

llvm-svn: 184076
2013-06-17 12:59:44 +00:00
Alexander Kornienko be633908be Don't remove backslashes from block comments.
Summary:
Don't remove backslashes from block comments. Previously this
/* \    \ \ \ \ \
*/
would be turned to this:
/*
*/
which spoils some kinds of ASCII-art, people use in their comments. The behavior
was related to handling escaped newlines in block comments inside preprocessor
directives. This patch makes handling it in a more civilized way.

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D979

llvm-svn: 183978
2013-06-14 11:46:10 +00:00
Alexander Kornienko f370ad9055 Preserve newlines before block comments in static initializers.
Summary:
Basically, don't special-case line comments in this regard. And fixed
an incorrect test, that relied on the wrong behavior.

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D962

llvm-svn: 183851
2013-06-12 19:04:12 +00:00
Alexander Kornienko 555efc36d0 Insert a space at the start of a line comment in case it starts with an alphanumeric character.
Summary:
"//Test" becomes "// Test". This change is aimed to improve code
readability and conformance to certain coding styles. If a comment starts with a
non-alphanumeric character, the space isn't added, e.g. "//-*-c++-*-" stays
unchanged.

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D949

llvm-svn: 183750
2013-06-11 16:01:49 +00:00
Alexander Kornienko ee4ca9ba0e Improved handling of escaped newlines at the token start.
Summary: Remove them from the TokenText as well.

Reviewers: klimek

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D935

llvm-svn: 183536
2013-06-07 17:45:07 +00:00
Alexander Kornienko dd7ece53a2 Fixed calculation of penalty when breaking tokens.
Summary:
Introduced two new style parameters: PenaltyBreakComment and
PenaltyBreakString. Add penalty for each character of a breakable token beyond
the column limit (this relates mainly to comments, as they are broken only on
whitespace). Tuned PenaltyBreakComment to prefer comment breaking over breaking
inside most binary expressions.
Fixed a bug that prevented *, & and && from being considered TT_BinaryOperator
in the presense of adjacent comments.

Reviewers: klimek, djasper

Reviewed By: klimek

CC: cfe-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D933

llvm-svn: 183530
2013-06-07 16:02:52 +00:00
Daniel Jasper 61a40782df Fix incorrect line breaking before trailing block comments.
Before, clang-format would happily move a trailing block comment to a
new line, which normally changes the perceived binding of that comment.

E.g., it would move:
void f() { /* comment */
  ...
}
to:
void f() {
  /* comment */
  ...
}

llvm-svn: 183420
2013-06-06 16:08:57 +00:00
Daniel Jasper 6dcecb6b33 Fix clang-format's expression parser for leading }s.
The leading "}" in the construct "} else if (..) {" was confusing the
expression parser. Thus, no fake parentheses were generated and the
indentation was broken in some cases.

llvm-svn: 183393
2013-06-06 09:11:58 +00:00
Daniel Jasper 058663787e Improve c-style cast detection.
Before:
return (my_int) aaaa;
template <> void f<int>(int i)SOME_ANNOTATION;
f("aaaa" SOME_MACRO(aaaa)"aaaa");

After:
return (my_int)aaaa;
template <> void f<int>(int i) SOME_ANNOTATION;
f("aaaa" SOME_MACRO(aaaa) "aaaa");

llvm-svn: 183389
2013-06-06 08:20:20 +00:00
Alexander Kornienko ffcc010767 UTF-8 support for clang-format.
Summary:
Detect if the file is valid UTF-8, and if this is the case, count code
points instead of just using number of bytes in all (hopefully) places, where
number of columns is needed. In particular, use the new
FormatToken.CodePointCount instead of TokenLength where appropriate.
Changed BreakableToken implementations to respect utf-8 character boundaries
when in utf-8 mode.

Reviewers: klimek, djasper

Reviewed By: djasper

CC: cfe-commits, rsmith, gribozavr

Differential Revision: http://llvm-reviews.chandlerc.com/D918

llvm-svn: 183312
2013-06-05 14:09:10 +00:00
Alexander Kornienko 4b67207157 Moved FormatToken to a separate header.
llvm-svn: 183115
2013-06-03 16:45:03 +00:00
Daniel Jasper 1027c6e5dd Let clang-format remove empty lines before "}".
These lines almost never aid readability.

Before:
void f() {
  int i;  // some variable

}

After:
void f() {
  int i;  // some variable
}

llvm-svn: 183112
2013-06-03 16:16:41 +00:00
Daniel Jasper 8050395236 Improve detection preventing certain kind of formatting patterns.
An oversight in this detection made clang-format unable to format
the following nicely:
void aaaaaaaaaaaaaaaaaaa<aaaaaaaaaaaaaaaaaaaaaaaaaaa,
                         bbbbbbbbbbbbbbbbbbbbbbbbbb>(
    cccccccccccccccccccccccccccc);

llvm-svn: 183097
2013-06-03 09:54:46 +00:00
Daniel Jasper 68d888cfed Fix line-breaking problem caused by comment.
Before, clang-format would not find a solution for formatting:
if ((aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ||
     bbbbbbbbbbbbbbbbbb) && // aaaaaaaaaaaaaaaa
    cccccc) {
}

llvm-svn: 183096
2013-06-03 08:42:05 +00:00
Daniel Jasper d589391d07 Improve recognition of template parameters.
Before: return a<b &&c> d;
After:  return a < b && c > d;
llvm-svn: 183077
2013-06-01 18:56:00 +00:00
Daniel Jasper da6f225ef9 Improve clang-format's c-style cast detection.
Before:
  x[(uint8) y];
  x = (uint8) y;
  void f() { x = (uint8) y; }
  #define AA(X) sizeof(((X *) NULL)->a)

After:
  x[(uint8)y];
  x = (uint8)y;
  void f() { x = (uint8)y; }
  #define AA(X) sizeof(((X *)NULL)->a)

llvm-svn: 183014
2013-05-31 16:14:28 +00:00
Daniel Jasper 393564fdfe Improve clang-format's error recovery.
If a "}" is found inside parenthesis, this is probably a case of
missing parenthesis. This enables continuing to format after stuff code
like:

class A {
  void f(
};
..

llvm-svn: 183009
2013-05-31 14:56:29 +00:00
Daniel Jasper a9eb2aafa1 Make formatting of empty blocks more consistent.
With this patch, the simplified rule is:
If the block is part of a declaration (class, namespace, function,
enum, ..), merge an empty block onto a single line. Otherwise
(specifically for the compound statements of if, for, while, ...),
keep the braces on two separate lines.

The reasons are:
- Mostly the formatting of empty blocks does not matter much.
- Empty compound statements are really rare and are usually just
  inserted while still working on the code. If they are on two lines,
  inserting code is easier. Also, overlooking the "{}" of an
  "if (...) {}" can be really bad.
- Empty declarations are not uncommon, e.g. empty constructors. Putting
  them on one line saves vertical space at no loss of readability.

llvm-svn: 183008
2013-05-31 14:56:20 +00:00
Daniel Jasper 2c611c0341 Properly format nested conditional operators.
Before:
bool aaaaaa = aaaaaaaaaaaaa //
                  ? aaaaaaaaaaaaaaa
                  : bbbbbbbbbbbbbbb //
                  ? ccccccccccccccc
                  : ddddddddddddddd;

After:
bool aaaaaa = aaaaaaaaaaaaa //
                  ? aaaaaaaaaaaaaaa
                  : bbbbbbbbbbbbbbb //
                        ? ccccccccccccccc
                        : ddddddddddddddd;

llvm-svn: 183007
2013-05-31 14:56:12 +00:00
Daniel Jasper 5648cb32d9 Fix detection/formatting of braced lists in ternary expressions.
Before:
foo = aaaaaaaaaaa ? vector<int> {
  aaaaaaaaaaaaaaaaaaaaaaaaaaa, aaaaaaaaaaaaaaaaaaaa, aaaaa
}
: vector<int>{ bbbbbbbbbbbbbbbbbbbbbbbbbbb, bbbbbbbbbbbbbbbbbbbb, bbbbb };

After:
foo = aaaaaaaaaaa ? vector<int>{ aaaaaaaaaaaaaaaaaaaaaaaaaaa,
                                 aaaaaaaaaaaaaaaaaaaa, aaaaa }
                  : vector<int>{ bbbbbbbbbbbbbbbbbbbbbbbbbbb,
                                 bbbbbbbbbbbbbbbbbbbb, bbbbb };

llvm-svn: 182992
2013-05-31 10:09:55 +00:00
Daniel Jasper ce257f296b More fixes for clang-format's multiline comment breaking.
llvm-svn: 182940
2013-05-30 17:27:48 +00:00
Daniel Jasper 58dd2f0652 Fix another clang-format crasher related to multi-line comments.
This fixes:
/*
*
* something long going over the column limit.
*/

llvm-svn: 182932
2013-05-30 15:20:29 +00:00
Manuel Klimek 8910d192d0 Add asserts to guard against regressions.
llvm-svn: 182916
2013-05-30 07:45:53 +00:00
Daniel Jasper 51fb2b2151 Fix crasher when formatting certain block comments.
Smallest reproduction:
/*
**
*/

llvm-svn: 182913
2013-05-30 06:40:07 +00:00
Manuel Klimek ae1fbfb740 Fixes error when splitting block comments.
When trying to fall back to search from the end onwards, we
would still find leading whitespace if the leading whitespace
went on after the end of the line.

llvm-svn: 182886
2013-05-29 22:06:18 +00:00
Manuel Klimek 4c5c28bb36 Use a non-recursive implementation to reconstruct line breaks.
Now that the TokenAnnotator does not require stack space anymore,
reconstructing the lines has become the limiting factor. This patch
fixes that problem, allowing large files with multiple megabytes of
single unwrapped lines to be formatted.

llvm-svn: 182861
2013-05-29 15:10:11 +00:00
Manuel Klimek 6e6310ec84 The second step in the token refactoring.
Gets rid of AnnotatedToken, putting everything into FormatToken.
FormatTokens are created once, and only referenced by pointer.  This
enables multiple future features, like having tokens shared between
multiple UnwrappedLines (while there's still work to do to fully enable
that).

llvm-svn: 182859
2013-05-29 14:47:47 +00:00
Daniel Jasper 41a0f78d43 Add return missing in r182855.
llvm-svn: 182856
2013-05-29 14:09:17 +00:00
Daniel Jasper 40e1921f2a Leave some macros on their own line
If an identifier is on its own line and it is all upper case, it is highly
likely that this is a macro that is meant to stand on a line by itself.

Before:
class A : public QObject {
  Q_OBJECT A() {}
};

Ater:
class A : public QObject {
  Q_OBJECT

  A() {}
};

llvm-svn: 182855
2013-05-29 13:16:10 +00:00
Daniel Jasper 61e6bbf850 Add option to always break template declarations.
With option enabled (e.g. in Google-style):
template <typename T>
void f() {}

With option disabled:
template <typename T> void f() {}

Enabling this for Google-style and Chromium-style, not sure which other
styles would prefer that.

llvm-svn: 182849
2013-05-29 12:07:31 +00:00
Daniel Jasper 12eba0a699 Remove obsolete variable as discovered in post-commit review.
llvm-svn: 182796
2013-05-28 19:11:43 +00:00
Daniel Jasper 1ec31065e8 Support uniform inits in braced lists.
This made it necessary to remove an error detection which would let us
bail out of braced lists in certain situations of missing "}". However,
as we always entirely escape from the braced list on finding ";", this
should not be a big problem.

With this, we can no format braced lists with uniformat inits:

return { arg1, SomeType { parameter } };

llvm-svn: 182788
2013-05-28 18:50:02 +00:00
Daniel Jasper 4d03d3b327 Fix formatting regression regarding pointers to arrays.
Before: f( (*PointerToArray)[10]);
After:  f((*PointerToArray)[10]);

This fixes llvm.org/PR16163

llvm-svn: 182777
2013-05-28 15:27:10 +00:00
Manuel Klimek 591ab5a830 Make UnwrappedLines and AnnotatedToken contain pointers to FormatToken.
The FormatToken is now not copyable any more.

llvm-svn: 182772
2013-05-28 13:42:28 +00:00
Manuel Klimek 15dfe7ac40 A first step towards giving format tokens pointer identity.
With this patch, we create all tokens in one go before parsing and pass
an ArrayRef<FormatToken*> to the UnwrappedLineParser. The
UnwrappedLineParser is switched to use pointer-to-token internally.

The UnwrappedLineParser still copies the tokens into the UnwrappedLines.
This will be fixed in an upcoming patch.

llvm-svn: 182768
2013-05-28 11:55:06 +00:00
Daniel Jasper bca4bbe30a Initial support for designated initializers.
llvm-svn: 182767
2013-05-28 11:30:49 +00:00
Manuel Klimek 34d15151c4 Disable tab expansion when counting the columns in block comments.
To fully support this, we also need to expand tabs in the text before
the block comment. This patch breaks indentation when there was a
non-standard mixture of spaces and tabs used for indentation, but
fixes a regression in the simple case:
{
  /*
   * Comment.
   */
  int i;
}
Is now formatted correctly, if there were tabs used for indentation
before.

llvm-svn: 182760
2013-05-28 10:01:59 +00:00
Manuel Klimek 281dcbe026 Fixes indentation of empty lines in block comments.
Block comment indentation of empty lines regressed, as we did not
have a test for it.
 /* Comment with...
  *
  * empty line. */
is now formatted correctly again.

llvm-svn: 182757
2013-05-28 08:55:01 +00:00
Daniel Jasper 3719428c06 Clean up formatting of function types.
Before:
int (*func)(void*);
void f() { int(*func)(void*); }

After (consistent space after "int"):
int (*func)(void*);
void f() { int (*func)(void*); }

llvm-svn: 182756
2013-05-28 08:33:00 +00:00
Daniel Jasper 9f82df295e Fix formatting of expressions containing ">>".
This gets turned into two ">" operators at the beginning in order to
simplify template parameter handling. Thus, we need a special case to
handle those two binary operators correctly.

With this patch, clang-format can now correctly handle cases like:
aaaaaa = aaaaaaa(aaaaaaa, // break
                 aaaaaa) >>
         bbbbbb;

llvm-svn: 182754
2013-05-28 07:42:44 +00:00
David Blaikie 8f6a2972ce Remove unreachable return
llvm-svn: 182742
2013-05-27 20:43:54 +00:00
Daniel Jasper 1eff9080af Improve formatting of templates.
Before: A < int&& > a;
After:  A<int &&> a;

Also remove obsolete FIXMEs.

llvm-svn: 182741
2013-05-27 16:36:33 +00:00
Manuel Klimek 9043c74f49 Major refactoring of BreakableToken.
Unify handling of whitespace when breaking protruding tokens with other
whitespace replacements.

As a side effect, the BreakableToken structure changed significantly:
- have a common base class for single-line breakable tokens, as they are
  much more similar
- revamp handling of multi-line comments; we now calculate the
  information about lines in multi-line comments similar to normal
  tokens, and always issue replacements

As a result, we were able to get rid of special casing of trailing
whitespace deletion for comments in the whitespace manager and the
BreakableToken and fixed bugs related to tab handling and escaped
newlines.

llvm-svn: 182738
2013-05-27 15:23:34 +00:00
Daniel Jasper 7b27a10b1e Improve indentation of assignments.
Before:
unsigned OriginalStartColumn = SourceMgr.getSpellingColumnNumber(
    Current.FormatTok.getStartOfNonWhitespace()) -
                               1;

After:
unsigned OriginalStartColumn =
    SourceMgr.getSpellingColumnNumber(
        Current.FormatTok.getStartOfNonWhitespace()) -
    1;

llvm-svn: 182733
2013-05-27 12:45:09 +00:00
Manuel Klimek 75081b5cf8 Address post-review comment from dblakie.
llvm-svn: 182732
2013-05-27 12:36:28 +00:00