Summary:
When the code is compiled for arm32 and the builtin `__va_list` declaration is created by `CreateAAPCSABIBuiltinVaListDecl`, the declaration is not saved in the `ASTContext` which may lead to a compilation error or crash.
Minimal reproducer I was able to find:
**header.h**
```
#include <stdarg.h>
typedef va_list va_list_1;
```
**test.cpp**
```
typedef __builtin_va_list va_list_2;
void foo(const char* format, ...) { va_list args; va_start( args, format ); }
```
Steps to reproduce:
```
clang -x c++-header --target=armv7l-linux-eabihf header.h
clang -c -include header.h --target=armv7l-linux-eabihf test.cpp
```
Compilation error:
```
error: non-const lvalue reference to type '__builtin_va_list'
cannot bind to a value of unrelated type 'va_list' (aka '__builtin_va_list')
```
Compiling the same code as a C source leads to a crash:
```
clang --target=armv7l-linux-eabihf header.h
clang -c -x c -include header.h --target=armv7l-linux-eabihf test.cpp
```
Reviewers: logan, rsmith
Subscribers: cfe-commits, asl, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D18557
llvm-svn: 264930
For the same reason as the corresponding load change.
Note that ExpandStore is completely broken for non-byte sized element
vector stores, but preserve the current broken behavior which has tests
for it. The behavior should be the same, but now introduces a new typed
store that is incorrectly split later rather than doing it directly.
llvm-svn: 264928
On AMDGPU we want to be able to promote i64/f64 loads to v2i32.
If the access is unaligned, this would conclude that since i64 is legal,
it would convert it back to i64 and there is an endless legalization
loop.
Extract the logic for scalarizing the load into a new TargetLowering
function, where this can also replace the custom function AMDGPU
has for this.
llvm-svn: 264927
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.
This fixes PR27133.
llvm-svn: 264926
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.
llvm-svn: 264922
Summary:
IsOverload has a param named UseUsingDeclRules. But as far as I can
tell, it should be called UseMemberUsingDeclRules. That is, it only
applies to "using" declarations inside classes or structs.
Reviewers: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18538
llvm-svn: 264920
Summary:
Currently it's a module pass. Make it a function pass so that we can
move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
which only accepts function passes.
Reviewers: rnk
Subscribers: tra, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D18615
llvm-svn: 264919
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style. (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)
No functional change, and the new API is backwards-compatible.
Reviewers: chandlerc
Subscribers: joker.eph, cfe-commits
Differential Revision: http://reviews.llvm.org/D18613
llvm-svn: 264918
Pretty mechanical change here. Just replacing all the std::error_code() with
llvm::Error() and make_dynamic_error_code with make_error<GenericError>
llvm-svn: 264917
The TailDup transform was removed in r138841 in 2011, along with most
of the tests for it. This test, however, was missed. Probably because
it had already been XFAIL'd for 3 years at that point (since r52243!)
and continued to fail when the opt flag for -tailduplicate stopped
being valid.
llvm-svn: 264916
Windows seems to complain that the file cannot be removed because
it is still in use. We don't have to remove the file but instead
just overwrite it, so do that.
llvm-svn: 264915
In some cases a slot for an identifier is requested but it gets written to
another module, causing an assertion.
At the point when we start serializing Rtypes, we have no imported IdentifierID
for float_round_style. We start serializing stuff and allocate an ID for it.
Then, during the serialization process, we pull in the identifier info for it
from TSchemaHelper. Finally, WriteIdentifierTable decides that the identifier
has not changed since it was deserialized, so doesn't emit it.
Fixes https://llvm.org/bugs/show_bug.cgi?id=27041
Discussed on IRC with Richard Smith. Agreed on post commit review if needed.
llvm-svn: 264913
Adds a GenericError class to lld/Core which can carry a string. This is
analygous to the dynamic_error we currently use in lld/Core.
Use this GenericError instead of make_dynamic_error_code. Also, provide
an implemention of GenericError::convertToErrorCode which for now converts
it in to the dynamic_error_code we used to have. This will go away once
all the APIs are converted.
llvm-svn: 264910
1 - DWARF in .o files with debug map in executable: we would place the compile unit index in the upper 32 bits of the 64 bit value and the lower 32 bits would be the DIE offset
2 - DWO: we would place the compile unit offset in the upper 32 bits of the 64 bit value and the lower 32 bits would be the DIE offset
There was a mixing and matching of this and it wasn't done consistently.
Major changes include:
The DIERef constructor that takes a lldb::user_id_t now requires a SymbolFileDWARF:
DIERef(lldb::user_id_t uid, SymbolFileDWARF *dwarf)
It is needed so that it can be decoded correctly. If it is DWARF in .o files with debug map in executable, then we get the right compile unit from the SymbolFileDWARFDebugMap, otherwise, we use the compile unit offset and DIE offset for DWO or normal DWARF.
The function:
lldb::user_id_t DIERef::GetUID() const;
Now becomes
lldb::user_id_t DIERef::GetUID(SymbolFileDWARF *dwarf) const;
Again, we need the DWARF file to encode it correctly.
This removes the need for "lldb::user_id_t SymbolFileDWARF::MakeUserID() const" and for bool SymbolFileDWARF::UserIDMatches (lldb::user_id_t uid) const". There were also many places were doing things inneficiently like:
1 - encode a dw_offset_t into a lldb::user_id_t
2 - call the public SymbolFile interface to resolve types using the lldb::user_id_t
3 - This would then decode the lldb::user_id_t into a DIERef, and then try to find that type.
There are many places that are now doing this more efficiently by storing DW_AT_type form values as DWARFFormValue objects and then making a DIERef from them and directly calling the underlying function to resolve the lldb_private::Type, lldb_private::CompilerType, lldb_private::CompilerDecl, lldb_private::CompilerDeclContext.
If there are any regressions in DWARF with DWO, we will need to fix any issues that arise since the original patch wasn't functional for the much more widely used DWARF in .o files with debug map.
<rdar://problem/25200976>
llvm-svn: 264909
There is code under review that requires StringMap to have a copy constructor,
and this makes StringMap more consistent with our other containers (like
DenseMap) that have copy constructors.
Differential Revision: http://reviews.llvm.org/D18506
llvm-svn: 264906
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:
LV: The Smallest and Widest types: 32 / 32 bits.
LV: The Widest register is: 256 bits.
LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Scalar loop costs: 3.
LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Vector loop of width 2 costs: 2.
LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32
LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4
LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body
LV: Vector loop of width 4 costs: 1.
...
LV: Selecting VF: 8.
LV: The target has 32 registers
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #4 Interval # 1
LV(REG): At #5 Interval # 1
LV(REG): VF = 8
The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.
The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.
While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.
This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.
Differential Revision: http://reviews.llvm.org/D18537
llvm-svn: 264904
This patch add a TLS relax optimization test when transforming
Initial-Exec to Local-Exec for local symbols (which can not be preempted).
llvm-svn: 264903
PGOFuncNames are used as the key to retrieve the Function definition from the
MD5 stored in the profile. For internal linkage function, we prefix the source
file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
symbols. This happens after value profile annotation, but those internal
linkage functions should not have a source prefix. To differentiate compiler
generated internal symbols from original ones, PGOFuncName meta data are
created and attached to the original internal symbols in the value profile
annotation step. If a symbol does not have the meta data, its original linkage
must be non-internal.
Also add a new map that maps PGOFuncName's MD5 value to the function definition.
Differential Revision: http://reviews.llvm.org/D17895
llvm-svn: 264902
These caused LNT failures due to new assertions when running with
-polly-position=before-vectorizer -polly-process-unprofitable for:
FAIL: clamscan.compile_time
FAIL: cjpeg.compile_time
FAIL: consumer-jpeg.compile_time
FAIL: shapes.compile_time
FAIL: clamscan.execution_time
FAIL: cjpeg.execution_time
FAIL: consumer-jpeg.execution_time
FAIL: shapes.execution_time
The failures have been introduced by r264782, but r264789 had to be reverted
as it depended on the earlier patch.
llvm-svn: 264885
What we are really trying to do here is to figure out if we are using
the 2015 STL. Unfortunately, so far as I know the MSVC STL does not
define a version macro that we can check directly. Instead I wrote a
check to see if char16_t works.
llvm-svn: 264881
Using ArrayRef in annotateValueSite's parameter instead of using an array
and it's size.
Differential Revision: http://reviews.llvm.org/D18568
llvm-svn: 264879
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.
This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.
shader-db stats:
2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)
Depends on D18451
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18452
llvm-svn: 264876
For compatability with GAS, nop and nopr are recognized as alises for
bc and bcr, respectively. A mask of 0 turns these instructions
effectively into no-operations.
Reviewed by Ulrich Weigand.
llvm-svn: 264875
BuiltinsSystemZ.def is extended to include the required processor
features per intrinsic.
New test test/CodeGen/builtins-systemz-error2.c that checks for
expected errors when instrinsics are used with a subtarget that does
not support the required feature (e.g. vector support).
Reviewed by Ulrich Weigand.
llvm-svn: 264873
These checks are redundant and can be removed
Reviewers: hans
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D18564
llvm-svn: 264872
This reverts commit r264869. I am seeing Windows bot failures due to the
"\" in the path being mishandled at some point (seems to be interpreted
wrongly at some point and llvm-as | llvm-dis is yielding some junk
characters). Need to investigate.
llvm-svn: 264871
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.
llvm-svn: 264870
Summary:
This change serializes out and in the SourceFileName to LLVM assembly
so that it is preserved through "llvm-dis | llvm-as". This is
necessary to ensure that the global identifiers created for local values
in the module summary index are the same even if the bitcode is
streamed out and read back from LLVM assembly.
Serializing the summary itself to LLVM assembly is in progress.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18588
llvm-svn: 264869
Modify these tests to ignore the source file name when looking for the
expected string. It was already catching the source file name once via
the ModuleID, and will catch it another time with an impending change to
LLVM to serialize out the module's SourceFileName.
llvm-svn: 264868
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.
llvm-svn: 264867