to MCObjectWriter's constructor.
MCObjectWriter takes ownership of its MCMachObjectTargetWriter argument -- this
patch plumbs that ownership relationship through the constructor (which
previously took raw MCMachObjectTargetWriter*) and the createMachObjectWriter
function.
llvm-svn: 315245
We end up creating COPY's that are either truncating/extending and this
should be illegal.
https://reviews.llvm.org/D37640
Patch for X86 and ARM by igorb, rovka
llvm-svn: 315240
Summary:
On behalf of julia.koval@intel.com
The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints).
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)
There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987).
The example of special case code:
```
void foo(unsigned short *p, int max, int n) {
int i;
unsigned m;
for (i = 0; i < n; i++) {
m = *--p;
*p = (unsigned short)(m >= max ? m-max : 0);
}
}
```
Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero.
Here is the table of types, I try to support, special case items are bold:
| Size | 128 | 256 | 512
| ----- | ----- | ----- | -----
| i8 | v16i8 | v32i8 | v64i8
| i16 | v8i16 | v16i16 | v32i16
| i32 | | **v8i32** | **v16i32**
| i64 | | | **v8i64**
Reviewers: zvi, spatel, DavidKreitzer, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37534
llvm-svn: 315237
Microsoft's debug implementation of std::copy checks if the destination is an
array and then does some bounds checking. This was causing an assertion
failure in fs::rename_internal which copies to a buffer of the appropriate
size but that's type-punned to an array of length 1 for API compatibility
reasons.
Fix is to make make the destination a pointer rather than an array.
llvm-svn: 315222
E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an
intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the
inverted condition code for the CSEL.
rdar://28495949
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D38160
llvm-svn: 315205
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.
Differential Revision: https://reviews.llvm.org/D38609
llvm-svn: 315201
Return the combined shuffle from combineX86ShufflesRecursively and perform the combineTo in the caller.
Makes it easier for future patches to use this in functions that aren't actually shuffles themselves.
llvm-svn: 315195
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.
Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.
Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.
This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.
The rest of the patch is removing all the excess scalar patterns.
Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38023
llvm-svn: 315181
Adding the scheduling information for the SkylakeServer (SKX) target.
This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443
Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
After the original commit ([[ https://reviews.llvm.org/rL304088 | rL304088 ]]) was reverted, a discussion in llvm-dev was opened on 'how to accomplish this task'.
In the discussion we concluded that the best way to achieve our goal (which is to automate the folding tables and remove the manually maintained tables) is:
# Commit the tablegen backend disabled by default.
# Proceed with an incremental updating of the manual tables - while checking the validity of each added entry.
# Repeat previous step until we reach a state where the generated and the manual tables are identical. Then we can safely remove the manual tables and include the generated tables instead.
# Schedule periodical (1 week/2 weeks/1 month) runs of the pass:
- if changes appear (new entries):
- make sure the entries are legal
- If they are not, mark them as illegal to folding
- Commit the changes (if there are any).
CMake flag added for this purpose is "X86_GEN_FOLD_TABLES". Building with this flags will run the pass and emit the X86GenFoldTables.inc file under build/lib/Target/X86/ directory which is a good reference for any developer who wants to take part in the effort of completing the current folding tables.
Differential Revision: https://reviews.llvm.org/D38028
llvm-svn: 315173
This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass.
Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass.
Differential Revision: https://reviews.llvm.org/D38027
llvm-svn: 315171
Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs.
Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead.
Differential Revision: https://reviews.llvm.org/D38506
llvm-svn: 315150
The SjLj intrinsics in the X86 backend are intended for use with
SjLj exception handling as well, since SVN r271244.
Differential Revision: https://reviews.llvm.org/D38532
llvm-svn: 315146
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!
To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.
Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.
Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda
Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic
Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb
Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.
llvm-svn: 315136
There's at least one bug here - this code can fail with vector types (PR34856).
It's also being called for FREM; I'm still trying to understand how that is valid.
llvm-svn: 315127
Patch to fix ternlog instructions with a folded
broadcast. The broadcast decorator, e.g. {1toX}, was missing.
Differential Revision: https://reviews.llvm.org/D38649
llvm-svn: 315122
This patch adds two new verifiers:
- It checks that the root DIE of a CU is actually a valid unit DIE.
(based on its tag)
- For DWARF5 which contains a unit type int he CU header, it checks that
this matches the type of the unit DIE.
Differential revision: https://reviews.llvm.org/D38453
llvm-svn: 315121
This appears to be miscompiling Clang, as shown on two Windows bootstrap
bots:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7611http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/6870
Nothing else is in the blame list. Both emit errors on this valid code
in the Windows ucrt headers:
C:\...\ucrt\malloc.h:95:32: error: invalid operands to binary expression ('char *' and 'int')
_Ptr = (char*)_Ptr + _ALLOCA_S_MARKER_SIZE;
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
I am attempting to reproduce this now.
This reverts r315044
llvm-svn: 315108
Summary:
After r303360, we initialize UsesCalleeSaves in runOnMachineFunction,
which runs after getRequiredProperties. UsesCalleeSaves was initialized
to 'false', so getRequiredProperties would always return an empty set.
We don't have a TargetMachine available early anymore after r303360.
Just removing the requirement of NoVRegs seems to make things work, so
let's do that.
Reviewers: thegameg, dschuff, MatzeB
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D38597
llvm-svn: 315089
The bitcode reader looks specifically for `__DATA, __objc_catlist` as a
section name. However, SVN r304661 removed the spaces (the two names
are functionally equivalent but do not compare equally
lexicographically). This causes compatibility issues. Add an
auto-upgrade path for removing the spaces as well as use the new name in
the LTO plugin.
llvm-svn: 315086
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.
Passes OpenCL conformance test_integer_ops quick_[u]long_math
Differential Revision: https://reviews.llvm.org/D38607
llvm-svn: 315081
The FrameInfo cannot be stored directly in the vector because chained
frames may refer to parent frames, so we need pointers that are stable
across a vector resize.
llvm-svn: 315080
The current implementation of rename uses ReplaceFile if the
destination file already exists. According to the documentation for
ReplaceFile, the source file is opened without a sharing mode. This
means that there is a short interval of time between when ReplaceFile
renames the file and when it closes the file during which the
destination file cannot be opened.
This behaviour is not POSIX compliant because rename is supposed
to be atomic. It was also causing intermittent link failures when
linking with a ThinLTO cache; the ThinLTO cache implementation expects
all cache files to be openable.
This patch addresses that problem by re-implementing rename
using CreateFile and SetFileInformationByHandle. It is roughly a
reimplementation of ReplaceFile with a better sharing policy as well
as support for renaming in the case where the destination file does
not exist.
This implementation is still not fully POSIX. Specifically in the case
where the destination file is open at the point when rename is called,
there will be a short interval of time during which the destination
file will not exist. It isn't clear whether it is possible to avoid
this using the Windows API.
Differential Revision: https://reviews.llvm.org/D38570
llvm-svn: 315079
Summary: stripPointerCast is not reliably returning the value that's being type-casted. Instead it may look further at function attributes to further propagate the value. Instead of relying on stripPOintercast, the more reliable solution is to directly use the pointer to the promoted direct call.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38603
llvm-svn: 315077
Unfortunately TableGen doesn't handle this yet:
Unable to deduce gMIR opcode to handle Src (which is a leaf).
Just add some temporary hand-written code to generate the proper MOVsr.
llvm-svn: 315071
Summary:
Xcode's dsymutil emits a __swift_ast DWARF section, which is required for debugging,
and which contains a byte-for-byte dump of the swiftmodule file.
Add this feature to llvm-dsymutil.
Tested with `gobjdump --dwarf=info -s`, by verifying that the contents of
`__DWARF.__swift_ast` match between Xcode's dsymutil and llvm-dsymutil
(Xcode's dwarfdump and llvm-dwarfdump don't currently recognize the
__swift_ast section).
Reviewers: aprantl, friss
Subscribers: llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D38504
llvm-svn: 315066
The machine scheduler (before register allocation) is enabled by default for
SystemZ.
The SelectionDAG scheduling preference now becomes source order scheduling
(was regpressure).
Review: Ulrich Weigand
https://reviews.llvm.org/D37977
llvm-svn: 315063