I don't think this changes anything functionally yet, but I plan to fix the disassembler to use this to disable matching certain instructions with 0xf3/0xf2/0x66 prefixes.
llvm-svn: 316337
Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection.
Differential Revision: https://reviews.llvm.org/D39169
llvm-svn: 316336
This is for consistency with lld-link, see https://reviews.llvm.org/D38972
Also give --version a help text so it shows up in --help / /? output (for
both clang-cl and regular clang).
llvm-svn: 316335
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.
This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.
Differential Revision: https://reviews.llvm.org/D38696
llvm-svn: 316331
Summary:
Support formatting formatv_objects.
While here, fix documentation about member-formatters, and attempted
perfect-forwarding (I think).
Reviewers: zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38997
llvm-svn: 316330
This fixes exporting functions in the following cases:
- functions starting with an underscore in def files
- functions starting with an underscore, via dllexport attributes, for mingw
- fastcall and vectorcall functions when declared undecorated in def files
- vectorcall functions when declared decorated in def files
- stdcall functions when declared decorated in def files for mingw
This still exports the stdcall functions with the wrong name
in the normal msvc/link.exe mode, if declared with decoration in
the def file though (this is not a regression though). Exporting
functions via def files including decoration is not something I
believe is routinely done though, but is tested to try to match
link.exe's behaviour as far as easily possible.
Differential Revision: https://reviews.llvm.org/D39170
llvm-svn: 316317
This fixes exporting functions starting with an underscore, and
fully decorated fastcall/vectorcall functions.
Tests will be added in the lld repo.
Differential Revision: https://reviews.llvm.org/D39168
llvm-svn: 316316
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.
Differential Revision: https://reviews.llvm.org/D38952
llvm-svn: 316313
r315292 introduced a change that's supposed to consistently ignore
"dead" output sections when placing orphans. Unfortunately, that
change doesn't handle the special case when the orphan section is
second to last section and the last section is dead (e.g. because
it's being discarded) introducing a regression in some cases.
This change handles this case by using the same predicate when
checking the last section.
Differential Revision: https://reviews.llvm.org/D39172
llvm-svn: 316307
The support of R_PPC_ADDR16_HI improves ld compatibility and makes
things on par with RuntimeDyldELF that already implements this
relocation.
Patch by vit9696.
llvm-svn: 316306
We used to have a map from section piece offsets to section pieces
as a cache for binary search. But I found that the map took quite a
large amount of memory and didn't make linking faster. So, in this
patch, I removed the map.
This patch saves 566 MiB of RAM (2.019 GiB -> 1.453 GiB) when linking
clang with debug info, and the link time is 4% faster in that test case.
Thanks for Sean Silva for pointing this out.
llvm-svn: 316305
The overflow detection assertions were tautological due to truncation.
Adjust them to no longer be tautological.
Patch by Alex Langford!
llvm-svn: 316303
Summary:
It's unclear if this is the only thing we can do but at least this is consistent with the check
of address space agreement in `isBitCastable`.
The code is used at least in both instcombine and jumpthreading though
I could only find a way to trigger the invalid cast in instcombine.
Reviewers: loladiro, sanjoy, majnemer
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34335
llvm-svn: 316302
This avoids a hack for making it a no-op for windows.
Also explicitly check for _WIN32 instead of assuming it.
Differential Revision: https://reviews.llvm.org/D39156
llvm-svn: 316300
As discussed in D39011:
https://reviews.llvm.org/D39011
...replacing constants with a variable is inverting the transform done
by other IR passes, so we definitely don't want to do this early.
In fact, it's questionable whether this transform belongs in SimplifyCFG
at all. I'll look at moving this to codegen as a follow-up step.
llvm-svn: 316298
Summary: test/CodeGen/PowerPC/pr33093.ll uses both powerpc64 (big-endian) and powerpc64le while the former was unsupported.
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D39164
llvm-svn: 316297
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
This is similar to how we generate the VEX tables.
More fixes are still needed for the instructions that use EVEX.b (broadcast and embedded rounding).
llvm-svn: 316294
The missed canonicalization/optimization in the motivating test from PR34471 leads to very different codegen:
int switcher(int x) {
switch(x) {
case 17: return 17;
case 19: return 19;
case 42: return 42;
default: break;
}
return 0;
}
int comparator(int x) {
if (x == 17) return 17;
if (x == 19) return 19;
if (x == 42) return 42;
return 0;
}
For the first example, we use a bit-test optimization to avoid a series of compare-and-branch:
https://godbolt.org/g/BivDsw
Differential Revision: https://reviews.llvm.org/D39011
llvm-svn: 316293
In order to identify the copy deduction candidate, I considered two approaches:
- attempt to determine whether an implicit guide is a copy deduction candidate by checking certain properties of its subsituted parameter during overload-resolution.
- using one of the many bits (WillHaveBody) from FunctionDecl (that CXXDeductionGuideDecl inherits from) that are otherwise irrelevant for deduction guides
After some brittle gymnastics w the first strategy, I settled on the second, although to avoid confusion and to give that bit a better name, i turned it into a member of an anonymous union.
Given this identification 'bit', the tweak to overload resolution was a simple reordering of the deduction guide checks (in SemaOverload.cpp::isBetterOverloadCandidate), in-line with Jason Merrill's p0620r0 drafting which made it into the working paper. Concordant with that, I made sure the copy deduction candidate is always added.
References:
See https://bugs.llvm.org/show_bug.cgi?id=34970
See http://wg21.link/p0620r0
llvm-svn: 316292
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.
Differential revision: https://reviews.llvm.org/D38143
llvm-svn: 316289
The pass scans the function to find instruction chains that define
registers in the same domain (closures).
It then calculates the cost of converting the closure to another domain.
If found profitable, the instructions are converted to instructions in
the other domain and the register classes are changed accordingly.
This commit adds the pass infrastructure and a simple conversion from
the GPR domain to the Mask domain.
Differential Revision:
https://reviews.llvm.org/D37251
Change-Id: Ic2cf1d76598110401168326d411128ae2580a604
llvm-svn: 316288