Commit Graph

13 Commits

Author SHA1 Message Date
Roman Lebedev a5baf86744 AMD BdVer2 (Piledriver) Initial Scheduler model
Summary:
# Overview
This is somewhat partial.
* Latencies are good {F7371125}
  * All of these remaining inconsistencies //appear// to be noise/noisy/flaky.
* NumMicroOps are somewhat good {F7371158}
  * Most of the remaining inconsistencies are from `Ld` / `Ld_ReadAfterLd` classes
* Actual unit occupation (pipes, `ResourceCycles`) are undiscovered lands, i did not really look there.
  They are basically verbatum copy from `btver2`
* Many `InstRW`. And there are still inconsistencies left...

To be noted:
I think this is the first new schedule profile produced with the new next-gen tools like llvm-exegesis!

# Benchmark
I realize that isn't what was suggested, but i'll start with some "internal" public real-world benchmark i understand - [[ https://github.com/darktable-org/rawspeed | RawSpeed raw image decoding library ]].
Diff (the exact clang from trunk without/with this patch):
```
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                                                        Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_mean                              -0.0607         -0.0604           234           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_median                            -0.0630         -0.0626           233           219           233           219
Canon/EOS 5D Mark II/09.canon.sraw1.cr2/threads:8/real_time_stddev                            +0.2581         +0.2587             1             2             1             2
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_pvalue                             0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_mean                              -0.0770         -0.0767           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_median                            -0.0767         -0.0763           144           133           144           133
Canon/EOS 5D Mark II/10.canon.sraw2.cr2/threads:8/real_time_stddev                            -0.4170         -0.4156             1             0             1             0
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_pvalue                                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_mean                                           -0.0271         -0.0270           463           450           463           450
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_median                                         -0.0093         -0.0093           453           449           453           449
Canon/EOS 5DS/2K4A9927.CR2/threads:8/real_time_stddev                                         -0.7280         -0.7280            13             4            13             4
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_mean                                           -0.0065         -0.0065           569           565           569           565
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_median                                         -0.0077         -0.0077           569           564           569           564
Canon/EOS 5DS/2K4A9928.CR2/threads:8/real_time_stddev                                         +1.0077         +1.0068             2             5             2             5
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_pvalue                                          0.0220          0.0199      U Test, Repetitions: 25 vs 25
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_mean                                           +0.0006         +0.0007           312           312           312           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_median                                         +0.0031         +0.0032           311           312           311           312
Canon/EOS 5DS/2K4A9929.CR2/threads:8/real_time_stddev                                         -0.7069         -0.7072             4             1             4             1
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_pvalue                                          0.0004          0.0004      U Test, Repetitions: 25 vs 25
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_mean                                           -0.0015         -0.0015           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_median                                         -0.0010         -0.0011           141           141           141           141
Canon/EOS 10D/CRW_7673.CRW/threads:8/real_time_stddev                                         -0.1486         -0.1456             0             0             0             0
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_pvalue                                          0.6139          0.8766      U Test, Repetitions: 25 vs 25
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_mean                                           -0.0008         -0.0005            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_median                                         -0.0006         -0.0002            60            60            60            60
Canon/EOS 40D/_MG_0154.CR2/threads:8/real_time_stddev                                         -0.1467         -0.1390             0             0             0             0
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_pvalue                                          0.0137          0.0137      U Test, Repetitions: 25 vs 25
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_mean                                           +0.0002         +0.0002           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_median                                         -0.0015         -0.0014           275           275           275           275
Canon/EOS 77D/IMG_4049.CR2/threads:8/real_time_stddev                                         +3.3687         +3.3587             0             2             0             2
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_pvalue                                     0.4041          0.3933      U Test, Repetitions: 25 vs 25
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_mean                                      +0.0004         +0.0004            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_median                                    -0.0000         -0.0000            67            67            67            67
Canon/PowerShot G1/crw_1693.crw/threads:8/real_time_stddev                                    +0.1947         +0.1995             0             0             0             0
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_pvalue                              0.0074          0.0001      U Test, Repetitions: 25 vs 25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_mean                               -0.0092         +0.0074           547           542            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_median                             -0.0054         +0.0115           544           541            25            25
Fujifilm/GFX 50S/20170525_0037TEST.RAF/threads:8/real_time_stddev                             -0.4086         -0.3486             8             5             0             0
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_pvalue                                        0.3320          0.0000      U Test, Repetitions: 25 vs 25
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_mean                                         +0.0015         +0.0204           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_median                                       +0.0001         +0.0203           218           218            12            12
Fujifilm/X-Pro2/_DSF3051.RAF/threads:8/real_time_stddev                                       +0.2259         +0.2023             1             1             0             0
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_pvalue                                      0.0000          0.0001      U Test, Repetitions: 25 vs 25
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_mean                                       -0.0209         -0.0179            96            94            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_median                                     -0.0182         -0.0155            95            93            90            88
GoPro/HERO6 Black/GOPR9172.GPR/threads:8/real_time_stddev                                     -0.6164         -0.2703             2             1             2             1
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_pvalue                                     0.0000          0.0000      U Test, Repetitions: 25 vs 25
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_mean                                      -0.0098         -0.0098           176           175           176           175
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_median                                    -0.0126         -0.0126           176           174           176           174
Kodak/DCS Pro 14nx/D7465857.DCR/threads:8/real_time_stddev                                    +6.9789         +6.9157             0             2             0             2
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 25 vs 25
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_mean                  -0.0237         -0.0238           474           463           474           463
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_median                -0.0267         -0.0267           473           461           473           461
Nikon/D850/Nikon-D850-14bit-lossless-compressed.NEF/threads:8/real_time_stddev                +0.7179         +0.7178             3             5             3             5
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_pvalue                   0.6837          0.6554      U Test, Repetitions: 25 vs 25
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_mean                    -0.0014         -0.0013          1375          1373          1375          1373
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_median                  +0.0018         +0.0019          1371          1374          1371          1374
Olympus/E-M1MarkII/Olympus_EM1mk2__HIRES_50MP.ORF/threads:8/real_time_stddev                  -0.7457         -0.7382            11             3            10             3
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_pvalue                                        0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_mean                                         -0.0080         -0.0289            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_median                                       -0.0070         -0.0287            22            22            10            10
Panasonic/DC-G9/P1000476.RW2/threads:8/real_time_stddev                                       +1.0977         +0.6614             0             0             0             0
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_pvalue                                       0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_mean                                        +0.0132         +0.0967            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_median                                      +0.0132         +0.0956            35            36            10            11
Panasonic/DC-GH5/_T012014.RW2/threads:8/real_time_stddev                                      -0.0407         -0.1695             0             0             0             0
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_pvalue                                      0.0000          0.0000      U Test, Repetitions: 25 vs 25
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_mean                                       +0.0331         +0.1307            13            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_median                                     +0.0430         +0.1373            12            13             6             6
Panasonic/DC-GH5S/P1022085.RW2/threads:8/real_time_stddev                                     -0.9006         -0.8847             1             0             0             0
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_pvalue                                            0.0016          0.0010      U Test, Repetitions: 25 vs 25
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_mean                                             -0.0023         -0.0024           395           394           395           394
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_median                                           -0.0029         -0.0030           395           394           395           393
Pentax/645Z/IMGP2837.PEF/threads:8/real_time_stddev                                           -0.0275         -0.0375             1             1             1             1
Phase One/P65/CF027310.IIQ/threads:8/real_time_pvalue                                          0.0232          0.0000      U Test, Repetitions: 25 vs 25
Phase One/P65/CF027310.IIQ/threads:8/real_time_mean                                           -0.0047         +0.0039           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_median                                         -0.0050         +0.0037           114           113            28            28
Phase One/P65/CF027310.IIQ/threads:8/real_time_stddev                                         -0.0599         -0.2683             1             1             0             0
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_pvalue                          0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_mean                           +0.0206         +0.0207           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_median                         +0.0204         +0.0205           405           414           405           414
Samsung/NX1/2016-07-23-142101_sam_9364.srw/threads:8/real_time_stddev                         +0.2155         +0.2212             1             1             1             1
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_pvalue                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_mean                          -0.0109         -0.0108           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_median                        -0.0104         -0.0103           147           145           147           145
Samsung/NX30/2015-03-07-163604_sam_7204.srw/threads:8/real_time_stddev                        -0.4919         -0.4800             0             0             0             0
Samsung/NX3000/_3184416.SRW/threads:8/real_time_pvalue                                         0.0000          0.0000      U Test, Repetitions: 25 vs 25
Samsung/NX3000/_3184416.SRW/threads:8/real_time_mean                                          -0.0149         -0.0147           220           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_median                                        -0.0173         -0.0169           221           217           220           217
Samsung/NX3000/_3184416.SRW/threads:8/real_time_stddev                                        +1.0337         +1.0341             1             3             1             3
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_pvalue                                         0.0001          0.0001      U Test, Repetitions: 25 vs 25
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_mean                                          -0.0019         -0.0019           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_median                                        -0.0021         -0.0021           194           193           194           193
Sony/DSLR-A350/DSC05472.ARW/threads:8/real_time_stddev                                        -0.4441         -0.4282             0             0             0             0
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_pvalue                                0.0000          0.4263      U Test, Repetitions: 25 vs 25
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_mean                                 +0.0258         -0.0006            81            83            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_median                               +0.0235         -0.0011            81            82            19            19
Sony/ILCE-7RM2/14-bit-compressed.ARW/threads:8/real_time_stddev                               +0.1634         +0.1070             1             1             0             0
```
{F7443905}
If we look at the `_mean`s, the time column, the biggest win is `-7.7%` (`Canon/EOS 5D Mark II/10.canon.sraw2.cr2`),
and the biggest loose is `+3.3%` (`Panasonic/DC-GH5S/P1022085.RW2`);
Overall: mean `-0.7436%`, median `-0.23%`, `cbrt(sum(time^3))` = `-8.73%`
Looks good so far i'd say.

llvm-exegesis details:
{F7371117} {F7371125}
{F7371128} {F7371144} {F7371158}

Reviewers: craig.topper, RKSimon, andreadb, courbet, avt77, spatel, GGanesh

Reviewed By: andreadb

Subscribers: javed.absar, gbedwell, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52779

llvm-svn: 345463
2018-10-27 20:46:30 +00:00
Roman Lebedev a51921877a [NFC][X86] Baseline tests for AMD BdVer2 (Piledriver) Scheduler model
Adding the baseline tests in a preparatory NFC commit,
so that the actual commit shows the *diff*.

Yes, i'm aware that a few of these codegen-based sched tests
are testing wrong instructions, i will fix that afterwards.

For https://reviews.llvm.org/D52779

llvm-svn: 345462
2018-10-27 20:36:11 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Pete Cooper 67cf9a723b Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Pete Cooper 72bc23ef02 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Sanjay Patel e57f3c0a42 Enable FeatureFastUAMem for btver2
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.

Replace the existing supposed small memcpy test with an actual test of a small memcpy. 
The previous test wasn't using FileCheck either.

This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6360

llvm-svn: 222925
2014-11-28 18:40:18 +00:00
Evan Cheng a048c83fe4 Revert r122955. It seems using movups to lower memcpy can cause massive regression (even on Nehalem) in edge cases. I also didn't see any real performance benefit.
llvm-svn: 123015
2011-01-07 19:35:30 +00:00
Evan Cheng 7998b1d6fe Use movups to lower memcpy and memset even if it's not fast (like corei7).
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010

llvm-svn: 122955
2011-01-06 07:58:36 +00:00
Evan Cheng 1e8ee79957 Add -mcpu to memcpy / memset tests to ensure they behave the same on all hosts / targets.
llvm-svn: 100101
2010-04-01 08:25:26 +00:00
Evan Cheng 43cd9e3845 Fix sdisel memcpy, memset, memmove lowering:
1. Makes it possible to lower with floating point loads and stores.
2. Avoid unaligned loads / stores unless it's fast.
3. Fix some memcpy lowering logic bug related to when to optimize a
   load from constant string into a constant.
4. Adjust x86 memcpy lowering threshold to make it more sane.
5. Fix x86 target hook so it uses vector and floating point memory
   ops more effectively.
rdar://7774704

llvm-svn: 100090
2010-04-01 06:04:33 +00:00
Dan Gohman 40503396da Eliminate more uses of llvm-as and llvm-dis.
llvm-svn: 81290
2009-09-08 23:54:48 +00:00
Dan Gohman a5b9645c4b Split the Add, Sub, and Mul instruction opcodes into separate
integer and floating-point opcodes, introducing
FAdd, FSub, and FMul.

For now, the AsmParser, BitcodeReader, and IRBuilder all preserve
backwards compatability, and the Core LLVM APIs preserve backwards
compatibility for IR producers. Most front-ends won't need to change
immediately.

This implements the first step of the plan outlined here:
http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt

llvm-svn: 72897
2009-06-04 22:49:04 +00:00
Dan Gohman 544ab2c50b Drop ISD::MEMSET, ISD::MEMMOVE, and ISD::MEMCPY, which are not Legal
on any current target and aren't optimized in DAGCombiner. Instead
of using intermediate nodes, expand the operations, choosing between
simple loads/stores, target-specific code, and library calls,
immediately.

Previously, the code to emit optimized code for these operations
was only used at initial SelectionDAG construction time; now it is
used at all times. This fixes some cases where rep;movs was being
used for small copies where simple loads/stores would be better.

This also cleans up code that checks for alignments less than 4;
let the targets make that decision instead of doing it in
target-independent code. This allows x86 to use rep;movs in
low-alignment cases.

Also, this fixes a bug that resulted in the use of rep;stos for
memsets of 0 with non-constant memory size when the alignment was
at least 4. It's better to use the library in this case, which
can be significantly faster when the size is large.

This also preserves more SourceValue information when memory
intrinsics are lowered into simple loads/stores.

llvm-svn: 49572
2008-04-12 04:36:06 +00:00