forked from OSchip/llvm-project
1087 lines
41 KiB
ReStructuredText
1087 lines
41 KiB
ReStructuredText
==============================
|
|
Moving LLVM Projects to GitHub
|
|
==============================
|
|
|
|
Current Status
|
|
==============
|
|
|
|
We are planning to complete the transition to GitHub by Oct 21, 2019. See
|
|
the GitHub migration `status page <https://llvm.org/GitHubMigrationStatus.html>`_
|
|
for the latest updates and instructions for how to migrate your workflows.
|
|
|
|
.. contents:: Table of Contents
|
|
:depth: 4
|
|
:local:
|
|
|
|
Introduction
|
|
============
|
|
|
|
This is a proposal to move our current revision control system from our own
|
|
hosted Subversion to GitHub. Below are the financial and technical arguments as
|
|
to why we are proposing such a move and how people (and validation
|
|
infrastructure) will continue to work with a Git-based LLVM.
|
|
|
|
What This Proposal is *Not* About
|
|
=================================
|
|
|
|
Changing the development policy.
|
|
|
|
This proposal relates only to moving the hosting of our source-code repository
|
|
from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
|
|
using GitHub's issue tracker, pull-requests, or code-review.
|
|
|
|
Contributors will continue to earn commit access on demand under the Developer
|
|
Policy, except that that a GitHub account will be required instead of SVN
|
|
username/password-hash.
|
|
|
|
Why Git, and Why GitHub?
|
|
========================
|
|
|
|
Why Move At All?
|
|
----------------
|
|
|
|
This discussion began because we currently host our own Subversion server
|
|
and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
|
|
provides limited support, but there is only so much it can do.
|
|
|
|
Volunteers are not sysadmins themselves, but compiler engineers that happen
|
|
to know a thing or two about hosting servers. We also don't have 24/7 support,
|
|
and we sometimes wake up to see that continuous integration is broken because
|
|
the SVN server is either down or unresponsive.
|
|
|
|
We should take advantage of one of the services out there (GitHub, GitLab,
|
|
and BitBucket, among others) that offer better service (24/7 stability, disk
|
|
space, Git server, code browsing, forking facilities, etc) for free.
|
|
|
|
Why Git?
|
|
--------
|
|
|
|
Many new coders nowadays start with Git, and a lot of people have never used
|
|
SVN, CVS, or anything else. Websites like GitHub have changed the landscape
|
|
of open source contributions, reducing the cost of first contribution and
|
|
fostering collaboration.
|
|
|
|
Git is also the version control many LLVM developers use. Despite the
|
|
sources being stored in a SVN server, these developers are already using Git
|
|
through the Git-SVN integration.
|
|
|
|
Git allows you to:
|
|
|
|
* Commit, squash, merge, and fork locally without touching the remote server.
|
|
* Maintain local branches, enabling multiple threads of development.
|
|
* Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
|
|
* Inspect the repository history (blame, log, bisect) without Internet access.
|
|
* Maintain remote forks and branches on Git hosting services and
|
|
integrate back to the main repository.
|
|
|
|
In addition, because Git seems to be replacing many OSS projects' version
|
|
control systems, there are many tools that are built over Git.
|
|
Future tooling may support Git first (if not only).
|
|
|
|
Why GitHub?
|
|
-----------
|
|
|
|
GitHub, like GitLab and BitBucket, provides free code hosting for open source
|
|
projects. Any of these could replace the code-hosting infrastructure that we
|
|
have today.
|
|
|
|
These services also have a dedicated team to monitor, migrate, improve and
|
|
distribute the contents of the repositories depending on region and load.
|
|
|
|
GitHub has one important advantage over GitLab and
|
|
BitBucket: it offers read-write **SVN** access to the repository
|
|
(https://github.com/blog/626-announcing-svn-support).
|
|
This would enable people to continue working post-migration as though our code
|
|
were still canonically in an SVN repository.
|
|
|
|
In addition, there are already multiple LLVM mirrors on GitHub, indicating that
|
|
part of our community has already settled there.
|
|
|
|
On Managing Revision Numbers with Git
|
|
-------------------------------------
|
|
|
|
The current SVN repository hosts all the LLVM sub-projects alongside each other.
|
|
A single revision number (e.g. r123456) thus identifies a consistent version of
|
|
all LLVM sub-projects.
|
|
|
|
Git does not use sequential integer revision number but instead uses a hash to
|
|
identify each commit.
|
|
|
|
The loss of a sequential integer revision number has been a sticking point in
|
|
past discussions about Git:
|
|
|
|
- "The 'branch' I most care about is mainline, and losing the ability to say
|
|
'fixed in r1234' (with some sort of monotonically increasing number) would
|
|
be a tragic loss." [LattnerRevNum]_
|
|
- "I like those results sorted by time and the chronology should be obvious, but
|
|
timestamps are incredibly cumbersome and make it difficult to verify that a
|
|
given checkout matches a given set of results." [TrickRevNum]_
|
|
- "There is still the major regression with unreadable version numbers.
|
|
Given the amount of Bugzilla traffic with 'Fixed in...', that's a
|
|
non-trivial issue." [JSonnRevNum]_
|
|
- "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
|
|
|
|
However, Git can emulate this increasing revision number:
|
|
``git rev-list --count <commit-hash>``. This identifier is unique only
|
|
within a single branch, but this means the tuple `(num, branch-name)` uniquely
|
|
identifies a commit.
|
|
|
|
We can thus use this revision number to ensure that e.g. `clang -v` reports a
|
|
user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
|
|
the objections raised above with respect to this aspect of Git.
|
|
|
|
What About Branches and Merges?
|
|
-------------------------------
|
|
|
|
In contrast to SVN, Git makes branching easy. Git's commit history is
|
|
represented as a DAG, a departure from SVN's linear history. However, we propose
|
|
to mandate making merge commits illegal in our canonical Git repository.
|
|
|
|
Unfortunately, GitHub does not support server side hooks to enforce such a
|
|
policy. We must rely on the community to avoid pushing merge commits.
|
|
|
|
GitHub offers a feature called `Status Checks`: a branch protected by
|
|
`status checks` requires commits to be whitelisted before the push can happen.
|
|
We could supply a pre-push hook on the client side that would run and check the
|
|
history, before whitelisting the commit being pushed [statuschecks]_.
|
|
However this solution would be somewhat fragile (how do you update a script
|
|
installed on every developer machine?) and prevents SVN access to the
|
|
repository.
|
|
|
|
What About Commit Emails?
|
|
-------------------------
|
|
|
|
We will need a new bot to send emails for each commit. This proposal leaves the
|
|
email format unchanged besides the commit URL.
|
|
|
|
Straw Man Migration Plan
|
|
========================
|
|
|
|
Step #1 : Before The Move
|
|
-------------------------
|
|
|
|
1. Update docs to mention the move, so people are aware of what is going on.
|
|
2. Set up a read-only version of the GitHub project, mirroring our current SVN
|
|
repository.
|
|
3. Add the required bots to implement the commit emails, as well as the
|
|
umbrella repository update (if the multirepo is selected) or the read-only
|
|
Git views for the sub-projects (if the monorepo is selected).
|
|
|
|
Step #2 : Git Move
|
|
------------------
|
|
|
|
4. Update the buildbots to pick up updates and commits from the GitHub
|
|
repository. Not all bots have to migrate at this point, but it'll help
|
|
provide infrastructure testing.
|
|
5. Update Phabricator to pick up commits from the GitHub repository.
|
|
6. LNT and llvmlab have to be updated: they rely on unique monotonically
|
|
increasing integer across branch [MatthewsRevNum]_.
|
|
7. Instruct downstream integrators to pick up commits from the GitHub
|
|
repository.
|
|
8. Review and prepare an update for the LLVM documentation.
|
|
|
|
Until this point nothing has changed for developers, it will just
|
|
boil down to a lot of work for buildbot and other infrastructure
|
|
owners.
|
|
|
|
The migration will pause here until all dependencies have cleared, and all
|
|
problems have been solved.
|
|
|
|
Step #3: Write Access Move
|
|
--------------------------
|
|
|
|
9. Collect developers' GitHub account information, and add them to the project.
|
|
10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
|
|
11. Update the documentation.
|
|
12. Mirror Git to SVN.
|
|
|
|
Step #4 : Post Move
|
|
-------------------
|
|
|
|
13. Archive the SVN repository.
|
|
14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
|
|
point to GitHub instead.
|
|
|
|
GitHub Repository Description
|
|
=============================
|
|
|
|
Monorepo
|
|
----------------
|
|
|
|
The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all
|
|
sub-projects in a single source tree. It is often referred to as a monorepo and
|
|
mimics an export of the current SVN repository, with each sub-project having its
|
|
own top-level directory. Not all sub-projects are used for building toolchains.
|
|
For example, www/ and test-suite/ are not part of the monorepo.
|
|
|
|
Putting all sub-projects in a single checkout makes cross-project refactoring
|
|
naturally simple:
|
|
|
|
* New sub-projects can be trivially split out for better reuse and/or layering
|
|
(e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
|
|
dependency on LLVM).
|
|
* Changing an API in LLVM and upgrading the sub-projects will always be done in
|
|
a single commit, designing away a common source of temporary build breakage.
|
|
* Moving code across sub-project (during refactoring for instance) in a single
|
|
commit enables accurate `git blame` when tracking code change history.
|
|
* Tooling based on `git grep` works natively across sub-projects, allowing to
|
|
easier find refactoring opportunities across projects (for example reusing a
|
|
datastructure initially in LLDB by moving it into libSupport).
|
|
* Having all the sources present encourages maintaining the other sub-projects
|
|
when changing API.
|
|
|
|
Finally, the monorepo maintains the property of the existing SVN repository that
|
|
the sub-projects move synchronously, and a single revision number (or commit
|
|
hash) identifies the state of the development across all projects.
|
|
|
|
.. _build_single_project:
|
|
|
|
Building a single sub-project
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Even though there is a single source tree, you are not required to build
|
|
all sub-projects together. It is trivial to configure builds for a single
|
|
sub-project.
|
|
|
|
For example::
|
|
|
|
mkdir build && cd build
|
|
# Configure only LLVM (default)
|
|
cmake path/to/monorepo
|
|
# Configure LLVM and lld
|
|
cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
|
|
# Configure LLVM and clang
|
|
cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
|
|
|
|
.. _git-svn-mirror:
|
|
|
|
Outstanding Questions
|
|
---------------------
|
|
|
|
Read-only sub-project mirrors
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
With the Monorepo, it is undecided whether the existing single-subproject
|
|
mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to
|
|
be maintained.
|
|
|
|
Read/write SVN bridge
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
GitHub supports a read/write SVN bridge for its repositories. However,
|
|
there have been issues with this bridge working correctly in the past,
|
|
so it's not clear if this is something that will be supported going forward.
|
|
|
|
Monorepo Drawbacks
|
|
------------------
|
|
|
|
* Using the monolithic repository may add overhead for those contributing to a
|
|
standalone sub-project, particularly on runtimes like libcxx and compiler-rt
|
|
that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
|
|
1GB for the monorepo), and the commit rate of LLVM may cause more frequent
|
|
`git push` collisions when upstreaming. Affected contributors may be able to
|
|
use the SVN bridge or the single-subproject Git mirrors. However, it's
|
|
undecided if these projects will continue to be maintained.
|
|
* Using the monolithic repository may add overhead for those *integrating* a
|
|
standalone sub-project, even if they aren't contributing to it, due to the
|
|
same disk space concern as the point above. The availability of the
|
|
sub-project Git mirrors would addresses this.
|
|
* Preservation of the existing read/write SVN-based workflows relies on the
|
|
GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
|
|
into GitHub and could restrict future workflow changes.
|
|
|
|
Workflows
|
|
^^^^^^^^^
|
|
|
|
* :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
|
|
* :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
|
|
* :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
|
|
* :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
|
|
* :ref:`Bisecting <workflow-mono-bisecting>`.
|
|
|
|
Workflow Before/After
|
|
=====================
|
|
|
|
This section goes through a few examples of workflows, intended to illustrate
|
|
how end-users or developers would interact with the repository for
|
|
various use-cases.
|
|
|
|
.. _workflow-checkout-commit:
|
|
|
|
Checkout/Clone a Single Project, with Commit Access
|
|
---------------------------------------------------
|
|
|
|
Currently
|
|
^^^^^^^^^
|
|
|
|
::
|
|
|
|
# direct SVN checkout
|
|
svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
|
|
# or using the read-only Git view, with git-svn
|
|
git clone https://llvm.org/git/llvm.git
|
|
cd llvm
|
|
git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
|
|
git config svn-remote.svn.fetch :refs/remotes/origin/master
|
|
git svn rebase -l # -l avoids fetching ahead of the git mirror.
|
|
|
|
Commits are performed using `svn commit` or with the sequence `git commit` and
|
|
`git svn dcommit`.
|
|
|
|
.. _workflow-multicheckout-nocommit:
|
|
|
|
Monorepo Variant
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
With the monorepo variant, there are a few options, depending on your
|
|
constraints. First, you could just clone the full repository:
|
|
|
|
git clone https://github.com/llvm/llvm-project.git
|
|
|
|
At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
|
|
:ref:`doesn't imply you have to build all of them <build_single_project>`. You
|
|
can still build only compiler-rt for instance. In this way it's not different
|
|
from someone who would check out all the projects with SVN today.
|
|
|
|
If you want to avoid checking out all the sources, you can hide the other
|
|
directories using a Git sparse checkout::
|
|
|
|
git config core.sparseCheckout true
|
|
echo /compiler-rt > .git/info/sparse-checkout
|
|
git read-tree -mu HEAD
|
|
|
|
The data for all sub-projects is still in your `.git` directory, but in your
|
|
checkout, you only see `compiler-rt`.
|
|
Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
|
|
usual.
|
|
|
|
Note that when you fetch you'll likely pull in changes to sub-projects you don't
|
|
care about. If you are using sparse checkout, the files from other projects
|
|
won't appear on your disk. The only effect is that your commit hash changes.
|
|
|
|
You can check whether the changes in the last fetch are relevant to your commit
|
|
by running::
|
|
|
|
git log origin/master@{1}..origin/master -- libcxx
|
|
|
|
This command can be hidden in a script so that `git llvmpush` would perform all
|
|
these steps, fail only if such a dependent change exists, and show immediately
|
|
the change that prevented the push. An immediate repeat of the command would
|
|
(almost) certainly result in a successful push.
|
|
Note that today with SVN or git-svn, this step is not possible since the
|
|
"rebase" implicitly happens while committing (unless a conflict occurs).
|
|
|
|
Checkout/Clone Multiple Projects, with Commit Access
|
|
----------------------------------------------------
|
|
|
|
Let's look how to assemble llvm+clang+libcxx at a given revision.
|
|
|
|
Currently
|
|
^^^^^^^^^
|
|
|
|
::
|
|
|
|
svn co https://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
|
|
cd llvm/tools
|
|
svn co https://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
|
|
cd ../projects
|
|
svn co https://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
|
|
|
|
Or using git-svn::
|
|
|
|
git clone https://llvm.org/git/llvm.git
|
|
cd llvm/
|
|
git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
|
|
git config svn-remote.svn.fetch :refs/remotes/origin/master
|
|
git svn rebase -l
|
|
git checkout `git svn find-rev -B r258109`
|
|
cd tools
|
|
git clone https://llvm.org/git/clang.git
|
|
cd clang/
|
|
git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
|
|
git config svn-remote.svn.fetch :refs/remotes/origin/master
|
|
git svn rebase -l
|
|
git checkout `git svn find-rev -B r258109`
|
|
cd ../../projects/
|
|
git clone https://llvm.org/git/libcxx.git
|
|
cd libcxx
|
|
git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
|
|
git config svn-remote.svn.fetch :refs/remotes/origin/master
|
|
git svn rebase -l
|
|
git checkout `git svn find-rev -B r258109`
|
|
|
|
Note that the list would be longer with more sub-projects.
|
|
|
|
.. _workflow-monocheckout-multicommit:
|
|
|
|
Monorepo Variant
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
The repository contains natively the source for every sub-projects at the right
|
|
revision, which makes this straightforward::
|
|
|
|
git clone https://github.com/llvm/llvm-project.git
|
|
cd llvm-projects
|
|
git checkout $REVISION
|
|
|
|
As before, at this point clang, llvm, and libcxx are stored in directories
|
|
alongside each other.
|
|
|
|
.. _workflow-cross-repo-commit:
|
|
|
|
Commit an API Change in LLVM and Update the Sub-projects
|
|
--------------------------------------------------------
|
|
|
|
Today this is possible, even though not common (at least not documented) for
|
|
subversion users and for git-svn users. For example, few Git users try to update
|
|
LLD or Clang in the same commit as they change an LLVM API.
|
|
|
|
The multirepo variant does not address this: one would have to commit and push
|
|
separately in every individual repository. It would be possible to establish a
|
|
protocol whereby users add a special token to their commit messages that causes
|
|
the umbrella repo's updater bot to group all of them into a single revision.
|
|
|
|
The monorepo variant handles this natively.
|
|
|
|
Branching/Stashing/Updating for Local Development or Experiments
|
|
----------------------------------------------------------------
|
|
|
|
Currently
|
|
^^^^^^^^^
|
|
|
|
SVN does not allow this use case, but developers that are currently using
|
|
git-svn can do it. Let's look in practice what it means when dealing with
|
|
multiple sub-projects.
|
|
|
|
To update the repository to tip of trunk::
|
|
|
|
git pull
|
|
cd tools/clang
|
|
git pull
|
|
cd ../../projects/libcxx
|
|
git pull
|
|
|
|
To create a new branch::
|
|
|
|
git checkout -b MyBranch
|
|
cd tools/clang
|
|
git checkout -b MyBranch
|
|
cd ../../projects/libcxx
|
|
git checkout -b MyBranch
|
|
|
|
To switch branches::
|
|
|
|
git checkout AnotherBranch
|
|
cd tools/clang
|
|
git checkout AnotherBranch
|
|
cd ../../projects/libcxx
|
|
git checkout AnotherBranch
|
|
|
|
.. _workflow-mono-branching:
|
|
|
|
Monorepo Variant
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Regular Git commands are sufficient, because everything is in a single
|
|
repository:
|
|
|
|
To update the repository to tip of trunk::
|
|
|
|
git pull
|
|
|
|
To create a new branch::
|
|
|
|
git checkout -b MyBranch
|
|
|
|
To switch branches::
|
|
|
|
git checkout AnotherBranch
|
|
|
|
Bisecting
|
|
---------
|
|
|
|
Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
|
|
|
|
Currently
|
|
^^^^^^^^^
|
|
|
|
SVN does not have builtin bisection support, but the single revision across
|
|
sub-projects makes it possible to script around.
|
|
|
|
Using the existing Git read-only view of the repositories, it is possible to use
|
|
the native Git bisection script over the llvm repository, and use some scripting
|
|
to synchronize the clang repository to match the llvm revision.
|
|
|
|
.. _workflow-mono-bisecting:
|
|
|
|
Monorepo Variant
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
Bisecting on the monorepo is straightforward, and very similar to the above,
|
|
except that the bisection script does not need to include the
|
|
`git submodule update` step.
|
|
|
|
The same example, finding which commit introduces a regression where clang-3.9
|
|
crashes but not clang-3.8 passes, will look like::
|
|
|
|
git bisect start releases/3.9.x releases/3.8.x
|
|
git bisect run ./bisect_script.sh
|
|
|
|
With the `bisect_script.sh` script being::
|
|
|
|
#!/bin/sh
|
|
cd $BUILD_DIR
|
|
|
|
ninja clang || exit 125 # an exit code of 125 asks "git bisect"
|
|
# to "skip" the current commit
|
|
|
|
./bin/clang some_crash_test.cpp
|
|
|
|
Also, since the monorepo handles commits update across multiple projects, you're
|
|
less like to encounter a build failure where a commit change an API in LLVM and
|
|
another later one "fixes" the build in clang.
|
|
|
|
Moving Local Branches to the Monorepo
|
|
=====================================
|
|
|
|
Suppose you have been developing against the existing LLVM git
|
|
mirrors. You have one or more git branches that you want to migrate
|
|
to the "final monorepo".
|
|
|
|
The simplest way to migrate such branches is with the
|
|
``migrate-downstream-fork.py`` tool at
|
|
https://github.com/jyknight/llvm-git-migration.
|
|
|
|
Basic migration
|
|
---------------
|
|
|
|
Basic instructions for ``migrate-downstream-fork.py`` are in the
|
|
Python script and are expanded on below to a more general recipe::
|
|
|
|
# Make a repository which will become your final local mirror of the
|
|
# monorepo.
|
|
mkdir my-monorepo
|
|
git -C my-monorepo init
|
|
|
|
# Add a remote to the monorepo.
|
|
git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git
|
|
|
|
# Add remotes for each git mirror you use, from upstream as well as
|
|
# your local mirror. All projects are listed here but you need only
|
|
# import those for which you have local branches.
|
|
my_projects=( clang
|
|
clang-tools-extra
|
|
compiler-rt
|
|
debuginfo-tests
|
|
libcxx
|
|
libcxxabi
|
|
libunwind
|
|
lld
|
|
lldb
|
|
llvm
|
|
openmp
|
|
polly )
|
|
for p in ${my_projects[@]}; do
|
|
git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git
|
|
git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git
|
|
done
|
|
|
|
# Pull in all the commits.
|
|
git -C my-monorepo fetch --all
|
|
|
|
# Run migrate-downstream-fork to rewrite local branches on top of
|
|
# the upstream monorepo.
|
|
(
|
|
cd my-monorepo
|
|
migrate-downstream-fork.py \
|
|
refs/remotes/local \
|
|
refs/tags \
|
|
--new-repo-prefix=refs/remotes/upstream/monorepo \
|
|
--old-repo-prefix=refs/remotes/upstream/split \
|
|
--source-kind=split \
|
|
--revmap-out=monorepo-map.txt
|
|
)
|
|
|
|
# Octopus-merge the resulting local split histories to unify them.
|
|
|
|
# Assumes local work on local split mirrors is on master (and
|
|
# upstream is presumably represented by some other branch like
|
|
# upstream/master).
|
|
my_local_branch="master"
|
|
|
|
git -C my-monorepo branch --no-track local/octopus/master \
|
|
$(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \
|
|
refs/remotes/local/split/llvm/${my_local_branch})
|
|
git -C my-monorepo checkout local/octopus/${my_local_branch}
|
|
|
|
subproject_branches=()
|
|
for p in ${my_projects[@]}; do
|
|
subproject_branch=${p}/local/monorepo/${my_local_branch}
|
|
git -C my-monorepo branch ${subproject_branch} \
|
|
refs/remotes/local/split/${p}/${my_local_branch}
|
|
if [[ "${p}" != "llvm" ]]; then
|
|
subproject_branches+=( ${subproject_branch} )
|
|
fi
|
|
done
|
|
|
|
git -C my-monorepo merge ${subproject_branches[@]}
|
|
|
|
for p in ${my_projects[@]}; do
|
|
subproject_branch=${p}/local/monorepo/${my_local_branch}
|
|
git -C my-monorepo branch -d ${subproject_branch}
|
|
done
|
|
|
|
# Create local branches for upstream monorepo branches.
|
|
for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
|
|
refs/remotes/upstream/monorepo); do
|
|
upstream_branch=${ref#refs/remotes/upstream/monorepo/}
|
|
git -C my-monorepo branch upstream/${upstream_branch} ${ref}
|
|
done
|
|
|
|
The above gets you to a state like the following::
|
|
|
|
U1 - U2 - U3 <- upstream/master
|
|
\ \ \
|
|
\ \ - Llld1 - Llld2 -
|
|
\ \ \
|
|
\ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master
|
|
\ /
|
|
- Lllvm1 - Lllvm2-----
|
|
|
|
Each branched component has its branch rewritten on top of the
|
|
monorepo and all components are unified by a giant octopus merge.
|
|
|
|
If additional active local branches need to be preserved, the above
|
|
operations following the assignment to ``my_local_branch`` should be
|
|
done for each branch. Ref paths will need to be updated to map the
|
|
local branch to the corresponding upstream branch. If local branches
|
|
have no corresponding upstream branch, then the creation of
|
|
``local/octopus/<local branch>`` need not use ``git-merge-base`` to
|
|
pinpoint its root commit; it may simply be branched from the
|
|
appropriate component branch (say, ``llvm/local_release_X``).
|
|
|
|
Zipping local history
|
|
---------------------
|
|
|
|
The octopus merge is suboptimal for many cases, because walking back
|
|
through the history of one component leaves the other components fixed
|
|
at a history that likely makes things unbuildable.
|
|
|
|
Some downstream users track the order commits were made to subprojects
|
|
with some kind of "umbrella" project that imports the project git
|
|
mirrors as submodules, similar to the multirepo umbrella proposed
|
|
above. Such an umbrella repository looks something like this::
|
|
|
|
UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master
|
|
| | | | | | |
|
|
Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1
|
|
|
|
The vertical bars represent submodule updates to a particular local
|
|
commit in the project mirror. ``UM3`` in this case is a commit of
|
|
some local umbrella repository state that is not a submodule update,
|
|
perhaps a ``README`` or project build script update. Commit ``UM8``
|
|
updates a submodule of local project ``myproj``.
|
|
|
|
The tool ``zip-downstream-fork.py`` at
|
|
https://github.com/greened/llvm-git-migration/tree/zip can be used to
|
|
convert the umbrella history into a monorepo-based history with
|
|
commits in the order implied by submodule updates::
|
|
|
|
U1 - U2 - U3 <- upstream/master
|
|
\ \ \
|
|
\ -----\--------------- local/zip--.
|
|
\ \ \ |
|
|
- Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-'
|
|
|
|
|
|
The ``U*`` commits represent upstream commits to the monorepo master
|
|
branch. Each submodule update in the local ``UM*`` commits brought in
|
|
a subproject tree at some local commit. The trees in the ``L*1``
|
|
commits represent merges from upstream. These result in edges from
|
|
the ``U*`` commits to their corresponding rewritten ``L*1`` commits.
|
|
The ``L*2`` commits did not do any merges from upstream.
|
|
|
|
Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but
|
|
if, say, ``U3`` changed some files in upstream clang, the ``Lclang1``
|
|
commit appearing after the ``Llld1`` commit would actually represent a
|
|
clang tree *earlier* in the upstream clang history. We want the
|
|
``local/zip`` branch to accurately represent the state of our umbrella
|
|
history and so the edge ``U2 -> Lclang1`` is a visual reminder of what
|
|
clang's tree actually looks like in ``Lclang1``.
|
|
|
|
Even so, the edge ``U3 -> Llld1`` could be problematic for future
|
|
merges from upstream. git will think that we've already merged from
|
|
``U3``, and we have, except for the state of the clang tree. One
|
|
possible mitigation strategy is to manually diff clang between ``U2``
|
|
and ``U3`` and apply those updates to ``local/zip``. Another,
|
|
possibly simpler strategy is to freeze local work on downstream
|
|
branches and merge all submodules from the latest upstream before
|
|
running ``zip-downstream-fork.py``. If downstream merged each project
|
|
from upstream in lockstep without any intervening local commits, then
|
|
things should be fine without any special action. We anticipate this
|
|
to be the common case.
|
|
|
|
The tree for ``Lclang1`` outside of clang will represent the state of
|
|
things at ``U3`` since all of the upstream projects not participating
|
|
in the umbrella history should be in a state respecting the commit
|
|
``U3``. The trees for llvm and lld should correctly represent commits
|
|
``Lllvm1`` and ``Llld1``, respectively.
|
|
|
|
Commit ``UM3`` changed files not related to submodules and we need
|
|
somewhere to put them. It is not safe in general to put them in the
|
|
monorepo root directory because they may conflict with files in the
|
|
monorepo. Let's assume we want them in a directory ``local`` in the
|
|
monorepo.
|
|
|
|
**Example 1: Umbrella looks like the monorepo**
|
|
|
|
For this example, we'll assume that each subproject appears in its own
|
|
top-level directory in the umbrella, just as they do in the monorepo .
|
|
Let's also assume that we want the files in directory ``myproj`` to
|
|
appear in ``local/myproj``.
|
|
|
|
Given the above run of ``migrate-downstream-fork.py``, a recipe to
|
|
create the zipped history is below::
|
|
|
|
# Import any non-LLVM repositories the umbrella references.
|
|
git -C my-monorepo remote add localrepo \
|
|
https://my.local.mirror.org/localrepo.git
|
|
git fetch localrepo
|
|
|
|
subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
|
|
libcxx libcxxabi libunwind lld lldb llgo llvm openmp
|
|
parallel-libs polly pstl )
|
|
|
|
# Import histories for upstream split projects (this was probably
|
|
# already done for the ``migrate-downstream-fork.py`` run).
|
|
for project in ${subprojects[@]}; do
|
|
git remote add upstream/split/${project} \
|
|
https://github.com/llvm-mirror/${subproject}.git
|
|
git fetch umbrella/split/${project}
|
|
done
|
|
|
|
# Import histories for downstream split projects (this was probably
|
|
# already done for the ``migrate-downstream-fork.py`` run).
|
|
for project in ${subprojects[@]}; do
|
|
git remote add local/split/${project} \
|
|
https://my.local.mirror.org/${subproject}.git
|
|
git fetch local/split/${project}
|
|
done
|
|
|
|
# Import umbrella history.
|
|
git -C my-monorepo remote add umbrella \
|
|
https://my.local.mirror.org/umbrella.git
|
|
git fetch umbrella
|
|
|
|
# Put myproj in local/myproj
|
|
echo "myproj local/myproj" > my-monorepo/submodule-map.txt
|
|
|
|
# Rewrite history
|
|
(
|
|
cd my-monorepo
|
|
zip-downstream-fork.py \
|
|
refs/remotes/umbrella \
|
|
--new-repo-prefix=refs/remotes/upstream/monorepo \
|
|
--old-repo-prefix=refs/remotes/upstream/split \
|
|
--revmap-in=monorepo-map.txt \
|
|
--revmap-out=zip-map.txt \
|
|
--subdir=local \
|
|
--submodule-map=submodule-map.txt \
|
|
--update-tags
|
|
)
|
|
|
|
# Create the zip branch (assuming umbrella master is wanted).
|
|
git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
|
|
|
|
Note that if the umbrella has submodules to non-LLVM repositories,
|
|
``zip-downstream-fork.py`` needs to know about them to be able to
|
|
rewrite commits. That is why the first step above is to fetch commits
|
|
from such repositories.
|
|
|
|
With ``--update-tags`` the tool will migrate annotated tags pointing
|
|
to submodule commits that were inlined into the zipped history. If
|
|
the umbrella pulled in an upstream commit that happened to have a tag
|
|
pointing to it, that tag will be migrated, which is almost certainly
|
|
not what is wanted. The tag can always be moved back to its original
|
|
commit after rewriting, or the ``--update-tags`` option may be
|
|
discarded and any local tags would then be migrated manually.
|
|
|
|
**Example 2: Nested sources layout**
|
|
|
|
The tool handles nested submodules (e.g. llvm is a submodule in
|
|
umbrella and clang is a submodule in llvm). The file
|
|
``submodule-map.txt`` is a list of pairs, one per line. The first
|
|
pair item describes the path to a submodule in the umbrella
|
|
repository. The second pair item describes the path where trees for
|
|
that submodule should be written in the zipped history.
|
|
|
|
Let's say your umbrella repository is actually the llvm repository and
|
|
it has submodules in the "nested sources" layout (clang in
|
|
tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule
|
|
pointing to some downstream repository. The submodule map file should
|
|
look like this (we still want myproj mapped the same way as
|
|
previously)::
|
|
|
|
tools/clang clang
|
|
tools/clang/tools/extra clang-tools-extra
|
|
projects/compiler-rt compiler-rt
|
|
projects/debuginfo-tests debuginfo-tests
|
|
projects/libclc libclc
|
|
projects/libcxx libcxx
|
|
projects/libcxxabi libcxxabi
|
|
projects/libunwind libunwind
|
|
tools/lld lld
|
|
tools/lldb lldb
|
|
projects/openmp openmp
|
|
tools/polly polly
|
|
projects/myproj local/myproj
|
|
|
|
If a submodule path does not appear in the map, the tools assumes it
|
|
should be placed in the same place in the monorepo. That means if you
|
|
use the "nested sources" layout in your umrella, you *must* provide
|
|
map entries for all of the projects in your umbrella (except llvm).
|
|
Otherwise trees from submodule updates will appear underneath llvm in
|
|
the zippped history.
|
|
|
|
Because llvm is itself the umbrella, we use --subdir to write its
|
|
content into ``llvm`` in the zippped history::
|
|
|
|
# Import any non-LLVM repositories the umbrella references.
|
|
git -C my-monorepo remote add localrepo \
|
|
https://my.local.mirror.org/localrepo.git
|
|
git fetch localrepo
|
|
|
|
subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
|
|
libcxx libcxxabi libunwind lld lldb llgo llvm openmp
|
|
parallel-libs polly pstl )
|
|
|
|
# Import histories for upstream split projects (this was probably
|
|
# already done for the ``migrate-downstream-fork.py`` run).
|
|
for project in ${subprojects[@]}; do
|
|
git remote add upstream/split/${project} \
|
|
https://github.com/llvm-mirror/${subproject}.git
|
|
git fetch umbrella/split/${project}
|
|
done
|
|
|
|
# Import histories for downstream split projects (this was probably
|
|
# already done for the ``migrate-downstream-fork.py`` run).
|
|
for project in ${subprojects[@]}; do
|
|
git remote add local/split/${project} \
|
|
https://my.local.mirror.org/${subproject}.git
|
|
git fetch local/split/${project}
|
|
done
|
|
|
|
# Import umbrella history. We want this under a different refspec
|
|
# so zip-downstream-fork.py knows what it is.
|
|
git -C my-monorepo remote add umbrella \
|
|
https://my.local.mirror.org/llvm.git
|
|
git fetch umbrella
|
|
|
|
# Create the submodule map.
|
|
echo "tools/clang clang" > my-monorepo/submodule-map.txt
|
|
echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt
|
|
echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt
|
|
echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt
|
|
echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt
|
|
echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt
|
|
echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt
|
|
echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt
|
|
echo "tools/lld lld" >> my-monorepo/submodule-map.txt
|
|
echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt
|
|
echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt
|
|
echo "tools/polly polly" >> my-monorepo/submodule-map.txt
|
|
echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt
|
|
|
|
# Rewrite history
|
|
(
|
|
cd my-monorepo
|
|
zip-downstream-fork.py \
|
|
refs/remotes/umbrella \
|
|
--new-repo-prefix=refs/remotes/upstream/monorepo \
|
|
--old-repo-prefix=refs/remotes/upstream/split \
|
|
--revmap-in=monorepo-map.txt \
|
|
--revmap-out=zip-map.txt \
|
|
--subdir=llvm \
|
|
--submodule-map=submodule-map.txt \
|
|
--update-tags
|
|
)
|
|
|
|
# Create the zip branch (assuming umbrella master is wanted).
|
|
git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master
|
|
|
|
|
|
Comments at the top of ``zip-downstream-fork.py`` describe in more
|
|
detail how the tool works and various implications of its operation.
|
|
|
|
Importing local repositories
|
|
----------------------------
|
|
|
|
You may have additional repositories that integrate with the LLVM
|
|
ecosystem, essentially extending it with new tools. If such
|
|
repositories are tightly coupled with LLVM, it may make sense to
|
|
import them into your local mirror of the monorepo.
|
|
|
|
If such repositories participated in the umbrella repository used
|
|
during the zipping process above, they will automatically be added to
|
|
the monorepo. For downstream repositories that don't participate in
|
|
an umbrella setup, the ``import-downstream-repo.py`` tool at
|
|
https://github.com/greened/llvm-git-migration/tree/import can help with
|
|
getting them into the monorepo. A recipe follows::
|
|
|
|
# Import downstream repo history into the monorepo.
|
|
git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git
|
|
git fetch myrepo
|
|
|
|
my_local_tags=( refs/tags/release
|
|
refs/tags/hotfix )
|
|
|
|
(
|
|
cd my-monorepo
|
|
import-downstream-repo.py \
|
|
refs/remotes/myrepo \
|
|
${my_local_tags[@]} \
|
|
--new-repo-prefix=refs/remotes/upstream/monorepo \
|
|
--subdir=myrepo \
|
|
--tag-prefix="myrepo-"
|
|
)
|
|
|
|
# Preserve release branches.
|
|
for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
|
|
refs/remotes/myrepo/release); do
|
|
branch=${ref#refs/remotes/myrepo/}
|
|
git -C my-monorepo branch --no-track myrepo/${branch} ${ref}
|
|
done
|
|
|
|
# Preserve master.
|
|
git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master
|
|
|
|
# Merge master.
|
|
git -C my-monorepo checkout local/zip/master # Or local/octopus/master
|
|
git -C my-monorepo merge myrepo/master
|
|
|
|
You may want to merge other corresponding branches, for example
|
|
``myrepo`` release branches if they were in lockstep with LLVM project
|
|
releases.
|
|
|
|
``--tag-prefix`` tells ``import-downstream-repo.py`` to rename
|
|
annotated tags with the given prefix. Due to limitations with
|
|
``fast_filter_branch.py``, unannotated tags cannot be renamed
|
|
(``fast_filter_branch.py`` considers them branches, not tags). Since
|
|
the upstream monorepo had its tags rewritten with an "llvmorg-"
|
|
prefix, name conflicts should not be an issue. ``--tag-prefix`` can
|
|
be used to more clearly indicate which tags correspond to various
|
|
imported repositories.
|
|
|
|
Given this repository history::
|
|
|
|
R1 - R2 - R3 <- master
|
|
^
|
|
|
|
|
release/1
|
|
|
|
The above recipe results in a history like this::
|
|
|
|
U1 - U2 - U3 <- upstream/master
|
|
\ \ \
|
|
\ -----\--------------- local/zip--.
|
|
\ \ \ |
|
|
- Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-'
|
|
/
|
|
R1 - R2 - R3 <-.
|
|
^ |
|
|
| |
|
|
myrepo-release/1 |
|
|
|
|
|
myrepo/master--'
|
|
|
|
Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs
|
|
from ``myrepo``. If you require commits from ``myrepo`` to be
|
|
interleaved with commits on local project branches (for example,
|
|
interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't
|
|
appear in an umbrella repository, a new tool will need to be
|
|
developed. Creating such a tool would involve:
|
|
|
|
1. Modifying ``fast_filter_branch.py`` to optionally take a
|
|
revlist directly rather than generating it itself
|
|
|
|
2. Creating a tool to generate an interleaved ordering of local
|
|
commits based on some criteria (``zip-downstream-fork.py`` uses the
|
|
umbrella history as its criterion)
|
|
|
|
3. Generating such an ordering and feeding it to
|
|
``fast_filter_branch.py`` as a revlist
|
|
|
|
Some care will also likely need to be taken to handle merge commits,
|
|
to ensure the parents of such commits migrate correctly.
|
|
|
|
Scrubbing the Local Monorepo
|
|
----------------------------
|
|
|
|
Once all of the migrating, zipping and importing is done, it's time to
|
|
clean up. The python tools use ``git-fast-import`` which leaves a lot
|
|
of cruft around and we want to shrink our new monorepo mirror as much
|
|
as possible. Here is one way to do it::
|
|
|
|
git -C my-monorepo checkout master
|
|
|
|
# Delete branches we no longer need. Do this for any other branches
|
|
# you merged above.
|
|
git -C my-monorepo branch -D local/zip/master || true
|
|
git -C my-monorepo branch -D local/octopus/master || true
|
|
|
|
# Remove remotes.
|
|
git -C my-monorepo remote remove upstream/monorepo
|
|
|
|
for p in ${my_projects[@]}; do
|
|
git -C my-monorepo remote remove upstream/split/${p}
|
|
git -C my-monorepo remote remove local/split/${p}
|
|
done
|
|
|
|
git -C my-monorepo remote remove localrepo
|
|
git -C my-monorepo remote remove umbrella
|
|
git -C my-monorepo remote remove myrepo
|
|
|
|
# Add anything else here you don't need. refs/tags/release is
|
|
# listed below assuming tags have been rewritten with a local prefix.
|
|
# If not, remove it from this list.
|
|
refs_to_clean=(
|
|
refs/original
|
|
refs/remotes
|
|
refs/tags/backups
|
|
refs/tags/release
|
|
)
|
|
|
|
git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} |
|
|
xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d
|
|
|
|
git -C my-monorepo reflog expire --all --expire=now
|
|
|
|
# fast_filter_branch.py might have gc running in the background.
|
|
while ! git -C my-monorepo \
|
|
-c gc.reflogExpire=0 \
|
|
-c gc.reflogExpireUnreachable=0 \
|
|
-c gc.rerereresolved=0 \
|
|
-c gc.rerereunresolved=0 \
|
|
-c gc.pruneExpire=now \
|
|
gc --prune=now; do
|
|
continue
|
|
done
|
|
|
|
# Takes a LOOOONG time!
|
|
git -C my-monorepo repack -A -d -f --depth=250 --window=250
|
|
|
|
git -C my-monorepo prune-packed
|
|
git -C my-monorepo prune
|
|
|
|
You should now have a trim monorepo. Upload it to your git server and
|
|
happy hacking!
|
|
|
|
References
|
|
==========
|
|
|
|
.. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
|
|
.. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
|
|
.. [JSonnRevNum] Joerg Sonnenberger, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
|
|
.. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
|
|
.. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
|