Linus Torvalds on how and when to maintain a clean git history (2009) (www.mail-archive.com)
from HaraldvonBlauzahn@feddit.org to programming@programming.dev on 03 Aug 07:50
https://feddit.org/post/16756716

I want clean history, but that really means (a) clean and (b) history.

People can (and probably should) rebase their private trees (their own work). That’s a cleanup. But never other peoples code. That’s a “destroy history”

So the history part is fairly easy. There’s only one major rule, and one minor clarification:

  • You must never EVER destroy other peoples history. You must not rebase commits other people did.

[…]

If you are working with git together with other people, it’s worth a read.

#programming

threaded - newest

HaraldvonBlauzahn@feddit.org on 03 Aug 08:00 next collapse

Have you ever seen or created something like this?

This might explain why it might be useful to re-write history at all, and why tools like jujutsu or gerrit are interesting, in specific contexts.

Flipper@feddit.org on 03 Aug 10:15 next collapse

In a project I’m in there are 20 commits just labeled .. The only reason I haven’t slapped them silly is they left before I started.

Everyday0764@lemmy.zip on 07 Aug 17:43 collapse

i have a project with 10 prs and 30 commits like that, they where setting up the ci/cd and they got tired of writing meaningful commits

Kissaki@programming.dev on 03 Aug 11:50 next collapse

While exploring solutions, I use f or ffto mean “follow-up/to-squash” and a to mean logically separate. Sometimes other (additional) short abbreviations to know where to move, squash, and edit the changes to.

Other than maybe initial development until the first stable/usable version, these never persist, though. And even then, only if it’s not a collaborative project. If it is shared or collaborative, “Iterate on x” is preferable as a non-descriptive title.

I guess my commit descriptions get better with project lifetime, not worse.

Fred@programming.dev on 03 Aug 12:27 collapse

While exploring solutions, I use f or ffto mean “follow-up/to-squash” and a to mean logically separate. Sometimes other (additional) short abbreviations to know where to move, squash, and edit the changes to.

I recently discovered git commit --fixup=abcd1234: it will make a new commit with a message of fixup! <message from abcd1234>. (It’s the only special thing that flag does: a specially formatted commit message, which you can craft yourself if you remember the spelling of the fixup! marker.)

When you later rebase, git rebase --interactive --autosquash will automatically mark that commit to be a fixup of abcd1234.

magit for emacs has shortcut for creating a fixup commit selecting the previous commit, I’m sure other interfaces do too.

I guess my commit descriptions get better with project lifetime

I’ve found that too, which I think is because as the project matures, you’re more likely to make fixes or contained features, as opposed to regular “change everything” as you explore the design in a young project.

GissaMittJobb@lemmy.ml on 03 Aug 21:05 collapse

That kind of commit quality should only really be permissible on private projects, and as a reviewer, it’s arguably acceptable to reject PRs with this kind of history.

You should be writing your commits to the benefit of the code reviewer - structure them in a logical fashion to tell the story about the changes you want to get merged.

For non-trivial branches I usually soft reset to the point where all code is unstaged and uncommitted and then curate the commits to align with what the reviewer should be reading. It’s not uncommon for me to have several branches containing a single “wip”-commit which I amend onto while building up the full code for the branch.

magic_lobster_party@fedia.io on 03 Aug 10:28 next collapse

I think this is dependent on context. Linus is working with a very public repository. Private repositories shared with a small team have different conditions.

What works in my smallish team at my company is:

  • Enable squash commits. Each PR should be squashed to a single commit. This makes the master branch linear and simple. This ensures each individual commit on master has been reviewed and is in a working state.
  • You can do whatever shit you want on your own branch. It’s going to be squashed anyway.
  • Don’t base your work on some other team member’s branch, unless agreed upon. That’s their work. You should only depend on the master branch.
  • Never rewrite what has already been committed to the master branch.
Kissaki@programming.dev on 03 Aug 12:14 next collapse

Enable squash commits. Each PR should be squashed to a single commit. This makes the master branch linear and simple. This ensures each individual commit on master has been reviewed and is in a working state.

In non-minimal changesets, I would miss information/documentation about individual logical changes that make up the changeset. Commit separation that is useful for review will also be useful for history.

I prefer a deliberate, rebase- and rewrite-heavy workflow with a semi-linear history. The linear history remains readable, while allowing sum-of-parts changesets/merges.

It’s an investment, but I think it guides into good structuring and thoughts, and whenever you look at history, you have more than a squashed potential mess.

Squash-on-merge is simpler to implement and justify, of course. Certainly much better than “never rebase, never rewrite, always merge”, which I am baffled some teams have no problem doing. The history tree quickly becomes unreadable.

magic_lobster_party@fedia.io on 03 Aug 12:28 next collapse

What I like with squash on merge is I don’t need to worry about shit my coworkers make. My coworkers can have terrible git disciplines, and the master branch is still clean.

killeronthecorner@lemmy.world on 03 Aug 12:49 next collapse

This is the reality. You’ll spend most of your time working with people of varying SCM skill levels, and spread all the way across the spectrum. Squash commits combined with centralised auditing (GHE, GitLab, etc) add the necessary rail to keep a clean history on main and to make building-block change sets easily revert-able.

In my decades working on large teams of engineers, the need to identify changes by wip/interim commits has never been terribly useful for the reason you describe: everyone has different git hygiene procedures and most corps don’t give a tiny little shit about maintaining that level of hygiene unless you’re white room / highly regulated.

And if you do want that level of depth you can go find the PR/MR in the central source where the revision history of the dead branch is often sustained (unless you configure it not to)

But yeah, I call YAGNI a lot on git history purists to this day. It’s a huge amount of effort and coordination to retain a tiny amount of value that is 50/50 gonna be useful depending on the git hygiene of the person who wrote it. Save your efforts and just read the damn code.

tatterdemalion@programming.dev on 03 Aug 16:04 collapse

There are CI tools like Prow and Tide which make it possible to use squash by default while still giving control to developers who want to use a different merge strategy.

HaraldvonBlauzahn@feddit.org on 03 Aug 16:59 next collapse

Commit separation that is useful for review will also be useful for history.

Also when using git bisect aka “The Alaskan Wolf Fence Method” on nasty bugs e.g. causing concurrency or UB issues.

It is also a potential downside of rebasing that it can (sometimes) invalidate interim tests.

FizzyOrange@programming.dev on 03 Aug 19:14 collapse

Not really because I’ve never seen a setup that requires every commit in a branch to compile and pass tests. Only the merge commit needs to.

Also if your PR is so big that it would be painful to bisect within it, then it should be broken into smaller PRs.

GissaMittJobb@lemmy.ml on 03 Aug 21:00 collapse

In non-minimal changesets, I would miss information/documentation about individual logical changes that make up the changeset.

It’s usually possible to find this by navigating back to the PR which you can find referenced in the squash commit.

I guess this might be a larger problem for codebases not following a trunk-based approach, where PRs grows to very large sizes before going into the mainline branch.

Kissaki@programming.dev on 04 Aug 16:53 collapse

Review iterations mean messy comits there though. And full documented history in Git seems preferable because you don’t have to switch tools, and for persistence and robustness too, in case of repo/review platform changes (switching platforms etc).

wccrawford@discuss.online on 03 Aug 13:24 next collapse

That sounds like pretty much exactly what we did at my last job, and it worked pretty well IMO. The individual commits in a PR didn’t ever matter. I don’t even think we used them for code review, except if it came up for review a second time after rework. In that case, we were able to just look at the new commit to see if the right changes were made.

And we definitely avoided basing off each other’s branches. We had to do it a few times. The only times it went well was when the intent was to merge the child branch into the feature branch. If they were actually separate tickets (and the second relied on the first) it was generally chaotic. But sometimes, it was just necessary.

fitgse@sh.itjust.works on 03 Aug 14:01 collapse

If ‘—first-parent’ was the default way that git log worked, I don’t think we’d even be having this argument over how to merge branches.

In my opinion, the best strategy is to always use a merge commit, and then when viewing master, always use —first-parent which will ONLY show commits on master. This gives you:

  • a very clean, linear history
  • the ability to let people work in their branches in their own way (it is ok to merge master into your branch multiple times without rebasing)
  • you can dig into the history of any branch if needed
  • it makes it easy to backport changes as you can cherry-pick out the merge commit which contains everything.

The problem is just the default log view of git and tools.

FizzyOrange@programming.dev on 03 Aug 19:19 collapse

The only rule you need is: preserve history that is worth preserving.

99% of the time, that means you should squash commits in a PR. Most commits should be small enough that they don’t need more fine grained history than one commit.

I will grant a couple of exceptions:

  1. Sometimes you have refactorings where you e.g. move a load of files and then do something else… Or do a big search and replace and then fix the errors. In these cases it’s nice to have the file moves or search/replace in separate commits to a) make review easier, b) make the significant changes easier to see, and c) let git track file moves reliably.
  2. Sometimes you have a very long lived feature branch that multiple people have worked on for months. That can be worth keeping history for.

Unfortunately, if you enable merge queues on GitHub it forces you to pick one method for all PRs, which is kind of dumb. We just use squash merges for everything and accept that sometimes it’s not the best.