[gst-devel] Fwd: submodules' shortcomings

Felipe Contreras felipe.contreras at gmail.com
Tue Jan 5 11:08:44 CET 2010


Hi,

Johannes Schindelin (gitte), one of the top contributors to the git
project[1] describes what's wrong with submodules and why support
won't likely improve any time soon. I remember someone didn't believe
me when I said that, so I thought it might be a good idea to share
this.

Cheers.

[1] https://www.ohloh.net/p/git/contributors?page=1

---------- Forwarded message ----------
From: Johannes Schindelin <Johannes.Schindelin at gmx.de>
Date: Tue, Jan 5, 2010 at 12:29 AM
Subject: submodules' shortcomings, was Re: RFC: display dirty
submodule working directory in git gui and gitk
To: Jens Lehmann <Jens.Lehmann at web.de>
Cc: Git Mailing List <git at vger.kernel.org>, Junio C Hamano
<gitster at pobox.com>, "Shawn O. Pearce" <spearce at spearce.org>, Paul
Mackerras <paulus at samba.org>, Heiko Voigt <hvoigt at hvoigt.net>, Lars
Hjemli <hjemli at gmail.com>


Hi,

On Mon, 4 Jan 2010, Jens Lehmann wrote:

> Am 04.01.2010 10:44, schrieb Johannes Schindelin:
> > The real problem is that submodules in the current form are not very
> > well designed.
>
> IMVHO using the tree sha1 for a submodule seems to be the 'natural' way
> to include another git repo. And it gives the reproducibility i expect
> from a scm. Or am i missing something?

You do remember the discussion at the Alles wird Git about the need for
Subversion external-like behavior, right?

> It looks to me as most shortcomings come from the fact that most git
> commands tend to ignore submodules (and if they don't, like git gui and
> gitk do now, they e.g. only show certain aspects of their state).

It is not only ignoring.  It is not being able to cope with the state only
submodules can be in (see below).

> Submodules are in heavy use in our company since last year. Virtually
> every patch i submitted for submodules came from that experience and
> scratched an itch i or one of my colleagues had (and the situation did
> already improve noticeably by the few things we changed). We are still
> convinced that using submodules was the right decision. But some work
> has still to be done to be able to use them easily and to get rid of
> some pitfalls.

Submodules may be the best way you have in Git for your workflow ATM.
But that does not mean that the submodule design is in any way
thought-through.

Just a few shortcomings that do show up in my main project (and to a
small extent in msysGit, as you are probably aware):

- submodules were designed with a strong emphasis on not being forced to
 check them out.  But Git makes it very unconvenient to actually check
 submodules out, let alone check them out at clone-time.  And it is
 outright impossible to _enforce_ a submodule to be checked out.

- among other use cases, submodules are recommended for sharing content
 between two different repositories. But it is part of the design that it
 is _very_ easy to forget to commit, or push the changes in the submodule
 that are required for the integrity of the superproject.

- that use case -- sharing content between different repositories -- is
 not really supported by submodules, but rather an afterthought.  This is
 all too obvious when you look at the restriction that the shared content
 must be in a single subdirectory.

- submodules would be a perfect way to provide a fast-forward-only media
 subdirectory that is written to by different people (artists) than to
 the superproject (developers).  But there is no mechanism to enforce
 shallow fetches, which means that this use case cannot be handled
 efficiently using Git.

- related are the use cases where it is desired not to have a fixed
 submodule tip committed to the superproject, but always to update to the
 current, say, master (like Subversion's externals).  This use case has
 been wished away by the people who implemented submodules in Git.  But
 reality has this nasty habit of ignoring your wishes, does it not?

- there have been patches supporting rebasing submodules, i.e.
 submodules where a "git submodule update" rebases the current branch to
 the revision committed to the superproject rather than detaching the
 HEAD, which everybody who ever contributed to a project with submodules
 should agree is a useful thing. But the patches only have been discussed
 to death, to the point where the discussion's information content was
 converging to zero, yet the patches did not make it into Git.  (FWIW
 this is one reason why I refuse to write patches to git-submodule.sh: I
 refuse to let my time to be wasted like that.)

- working directories with GIT_DIRs are a very different beast from single
 files.  That alone leads to a _lot_ of problems.  The original design of
 Git had only a couple of states for named content (AKA files): clean,
 added, removed, modified.  The states that are possible with submodules
 are for the most part not handled _at all_ by most Git commands (and it
 is sometimes very hard to decide what would be the best way to handle
 those states, either).  Just think of a submodule at a different
 revision than committed in the superproject, with uncommitted changes,
 ignored and unignored files, a few custom hooks, a bit of additional
 metadata in the .git/config, and just for fun, a few temporary files in
 .git/ which are used by the hooks.

- while it might be called clever that the submodules' metadata are stored
 in .gitmodules in the superproject (and are therefore naturally tracked
 with Git), the synchronization with .git/config is performed exactly
 once -- when you initialize the submodule.  You are likely to miss out
 on _every_ change you pulled into the superproject.

All in all, submodules are very clumsy to work with, and you are literally
forced to provide scripts in the superproject to actually work with the
submodules.

> > In ths short run, we can paper over the shortcomings of the submodules
> > by introducing a command line option "--include-submodules" to
> > update-refresh, diff-files and diff-index, though.
>
> Maybe this is the way to go for now (and hopefully we can turn this
> option on by default later because we did the right thing ;-).

I do not think that --include-submodules is a good default.  It is just
too expensive in terms of I/O even to check the status in a superproject
with a lot of submodules.

Besides, as long as there is enough reason to have out-of-Git alternative
solutions such as repo, submodules deserve to be 2nd-class citizens.

Ciao,
Dscho

--

-- 
Felipe Contreras




More information about the gstreamer-devel mailing list