For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.
To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.
To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
Summary:
Clang -fpic defaults to -fno-semantic-interposition (GCC -fpic defaults
to -fsemantic-interposition).
Users need to specify -fsemantic-interposition to get semantic
interposition behavior.
Semantic interposition is currently a best-effort feature. There may
still be some cases where it is not handled well.
Reviewers: peter.smith, rnk, serge-sans-paille, sfertile, jfb, jdoerfert
Subscribers: dschuff, jyknight, dylanmckay, nemanjai, jvesely, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, arphaman, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73865
both LE and BE targets.
AFAICT, Clang get's this correct for PPC64. I've compared it to GCC 4.8
output for PPC64 (thanks Roman!) and to my limited ability to read power
assembly, it looks functionally equivalent. It would be really good to
fill in the assertions on this test case for x86-32, PPC32, ARM, etc.,
but I've reached the limit of my time and energy... Hopefully other
folks can chip in as it would be good to have this in place to test any
subsequent changes.
To those who care about PPC64 performance, a side note: there is some
*obnoxiously* bad code generated for these test cases. It would be worth
someone's time to sit down and teach the PPC backend to pattern match
these IR constructs better. It appears that things like '(shr %foo,
<imm>)' turn into 'rldicl R, R, 64-<imm>, <imm>' or some such. They
don't even get combined with other 'rldicl' instructions *immediately
adjacent*. I'll add a couple of these patterns to the README, but
I think it would be better to look at all the patterns produced by this
and other bitfield access code, and systematically build up a collection
of patterns that efficiently reduce them to the minimal code.
llvm-svn: 169693
This was an egregious bug due to the several iterations of refactorings
that took place. Size no longer meant what it original did by the time
I finished, but this line of code never got updated. Unfortunately we
had essentially zero tests for this in the regression test suite. =[
I've added a PPC64 run over the bitfield test case I've been primarily
using. I'm still looking at adding more tests and making sure this is
the *correct* bitfield access code on PPC64 linux, but it looks pretty
close to me, and it is *worlds* better than before this patch as it no
longer asserts! =] More commits to follow with at least additional tests
and maybe more fixes.
Sorry for the long breakage due to this....
llvm-svn: 169691
generally support the C++11 memory model requirements for bitfield
accesses by relying more heavily on LLVM's memory model.
The primary change this introduces is to move from a manually aligned
and strided access pattern across the bits of the bitfield to a much
simpler lump access of all bits in the bitfield followed by math to
extract the bits relevant for the particular field.
This simplifies the code significantly, but relies on LLVM to
intelligently lowering these integers.
I have tested LLVM's lowering both synthetically and in benchmarks. The
lowering appears to be functional, and there are no really significant
performance regressions. Different code patterns accessing bitfields
will vary in how this impacts them. The only real regressions I'm seeing
are a few patterns where the LLVM code generation for loads that feed
directly into a mask operation don't take advantage of the x86 ability
to do a smaller load and a cheap zero-extension. This doesn't regress
any benchmark in the nightly test suite on my box past the noise
threshold, but my box is quite noisy. I'll be watching the LNT numbers,
and will look into further improvements to the LLVM lowering as needed.
llvm-svn: 169489