Skip to content

Add optional date argument to builtins.fetchGit #7362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Gabriella439
Copy link
Contributor

@Gabriella439 Gabriella439 commented Nov 28, 2022

This allows users to specify an absolute or relative date (basically, anything that git accepts as a date specification) when fetching a repository.

The motivation for this change is to enable support for incremental builds for Haskell packages for this Nixpkgs pull request:

NixOS/nixpkgs#204020

… inspired by this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

Keep in mind that this new feature can power incremental builds for other package managers, too. There is not much that is Haskell-specific about this feature.

The basic idea is that instead of Nix doing a full build for a package, we split every build into two builds:

  • A full build at an older point in time

    e.g. a daily or weekly time boundary

  • An incremental build relative to the last full build

    This incremental build reuses the build products left over from the most recent full build.

In order to do this, though, we need a way to "snap" a package's git source input to an earlier point in time (e.g. a daily boundary or weekly boundary). This would allow multiple incremental builds to share the same full rebuild if they snap to the same time boundary.

The two main approaches I considered were:

  • Approach 1 (this PR)

    Patch Nix to add a date argument to builtins.fetchGit

  • Approach 2

    • Patch nix-prefetch-git to support a new --date option

    • Disable the sandbox

    • Run nix-prefetch-git at evaluation time using import-from-derivation to fetch and rehash the repository at an older point in time

Approach 1 seemed the more desirable of the two.

This allows users to specify an absolute or relative
date (basically, anything that git accepts as a date
specification) when fetching a repository.

The motivation for this change is to enable support
for incremental builds for Haskell packages for this
Nixpkgs branch:

https://github.com/MercuryTechnologies/nixpkgs/tree/gabriella/incremental

… inspired by this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

Keep in mind that this new feature can power incremental
builds for other package managers, too.  There is not
much that is Haskell-specific about this feature.

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The two main approaches I considered were:

- Approach 1 (this PR)

  Patch Nix to add a `date` argument to `builtins.fetchGit`

- Approach 2

  - Patch `nix-prefetch-git` to support a new `--date` option

  - Disable the sandbox

  - Run `nix-prefetch-git` at evaluation time using
    import-from-derivation to fetch and rehash the repository
    at an older point in time

Approach 1 seemed the more desirable of the two.
Gabriella439 added a commit to MercuryTechnologies/nixpkgs that referenced this pull request Dec 1, 2022
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.
@fricklerhandwerk fricklerhandwerk added the feature Feature request or proposal label Dec 5, 2022
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixpkgs-support-for-incremental-haskell-builds/24115/3

@blaggacao
Copy link
Contributor

blaggacao commented Dec 20, 2022

Couldn't another approach be to build builtins.unsafeDiscardOutputDependency in a first pass, which presumably would be the inputs, which presumably are less volatile in the target scenario?

No. → https://felixspringer.xyz/homepage/blog/incrementalHaskellBuildsWithNix had a concise answer for me.

wavewave pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Dec 21, 2022
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.
wavewave pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Jan 17, 2023
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.
wavewave pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Jan 17, 2023
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.
wavewave pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Jan 17, 2023
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.
wavewave pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Jan 19, 2023
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.
@Gabriella439
Copy link
Contributor Author

I'm not sure what is the etiquette or protocol to ping for a review for this repository

@jappeace
Copy link

jappeace commented Jan 31, 2023

go to the nix dev matrix channel, find someone there (at least it was for nixpkgs)

Copy link
Member

@Ericson2314 Ericson2314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say let's put this behind an unstable feature an then accept it. I hope there will be better solutions for this problem so we can rip out this feature later, but I recognize those alternatives are not yet available.

@roberth roberth added the fetching Networking with the outside (non-Nix) world, input locking label May 12, 2023
@thufschmitt
Copy link
Member

Nix team meeting notes

That is a bit ad-hoc, and highly impure. It's a very clever abuse of Nix, but an abuse nonetheless, so we wouldn't want this to go upstream, especially since it can be worked around with a bit of infrastructure (like setting-up a cron job to update a tag/branch that Nix can pull from)

@Ericson2314
Copy link
Member

Also the team reminded me we don't want to give out "false hope" by accepting experimental features we never intend to stabilize, which was what I was suggesting. Fair enough.

@Gabriella439
Copy link
Contributor Author

Gabriella439 commented May 27, 2023

@thufschmitt: The corresponding blog post explains the problem with using a cron job to update the lockfile periodically:

However, there are a few issues with this approach:

  • It only works well for short-lived pull requests

    In other words, if you update the revision used for the full build once a day then typically only pull requests that are less than a day old will benefit from incremental builds.

    Specifically, what we’d really like is “branch-local” incremental builds. In other words if a longer-lived development branch were to deposit a few commits a day we’d like there to be a full rebuild once a day on that branch so that incremental builds against the tip of that development branch remain snappy.

  • It pollutes the git history

    If you bump the lockfile, say, once per day then that’s one junk commit that you’ve added to your git history every day.

  • It’s difficult to open source any useful automation around this

    If the solution requires out-of-band machinery (e.g. some recurring cron job) to bump the lockfile you can’t provide a great user experience for open source projects. It only really works well for proprietary projects that can tolerate that complexity.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-05-26-nix-team-meeting-minutes-58/28572/1

@sternenseemann
Copy link
Member

highly impure

I also dislike this feature, but I think this assertion is unfair (I think @Gabriella439 already explained this somewhere, but I can't find the comment anymore): rev fixes the entire state of the git repository we want including the commit history and all its associated dates. Given rev and date, fetchGit should then be pure. Why does the Nix team think this is impure? I could not find a specific reason in the meeting minutes unfortunately.

@Gabriella439
Copy link
Contributor Author

Also, even if this were impure, I don't see how that would argue against including this. builtins.fetchGit already permits impurity. For example, you can specify a branch without a revision, which is impure, and that's currently allowed.

@thufschmitt
Copy link
Member

The impurity wasn't about the fetching itself (which is actually pure if you specify an immutable rev as the baseline), but the use-case.

If the solution requires out-of-band machinery (e.g. some recurring cron job) to bump the lockfile you can’t provide a great user experience for open source projects. It only really works well for proprietary projects that can tolerate that complexity.

I don't really buy this argument. Every non-trivial project will have some CI system that could be used to update a tag every so often. It is indeed slightly more complicated, but not to the point of being a blocker if this approach is indeed worth it

9999years pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Jun 7, 2023
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.

Lower required version

… so that it works against the upstream PR

I'll change the required version to an official release
if the PR is merged.

s/pkgs/pkg/g

… as caught by @cdepillabout

Co-authored-by: Dennis Gosnell <[email protected]>

Skip the use of `tar`

We can store the `dist` directory decompressed, which
speeds up the dist export/import

This potentially requires more disk space *but* by storing
the files unpacked it may actually improve disk utilization
in some cases if `auto-optimise-store` is enabled by
permitting deduplication of `dist` files.

Add an `installDist` phase

… which is disabled by default

The motivation for this is to bring the behavior of
`enableSeparateDistOutput` more in line with the other
options where it doesn't change *whether* or not something
is exported, but rather *where* it is exported.

Now `installDist` controls whether or not the `dist`
directory is exported.

Based on this discussion:

https://github.com/NixOS/nixpkgs/pull/203499/files#r1034150076

Document `interval` argument

… as suggested by @cdepillabout

s/for use for/for use with/

… based on feedback from @MaxGabriel

Move `installDistPhase` to `postPhases`

There are two reasons for doing this:

- We can get rid of the hack to remove the dist output from the outputs

- We can ensure that any changes that happen in the install phase are
  correctly reflected in the `dist` export

Disable dylib workaround for incremental build

Improve correctness of `incremental` function

Typically we don't want to just roll back the source code that is the
input for the Haskell package because the dependencies for the package
may have changed

In other words, if you roll back the source code for the top-level
package without also rolling back the Nix-supplied dependencies
for that build then you run the risk of an unexpected build failure
(due to an older version of the Haskell package being built against
a newer version of the Nix-supplied dependencies).

What you actually want to do is to roll back the entire repository
(i.e. the Haskell source code and the supporting Nix code) to ensure
that the Haskell source code and Nix code stay in sync.

This more generalized rollback complicates the UX for the
`incremental` function.  I did my best to try to streamline
the UX so that the user just needs to specify how to locate the
matching (older) package after a rollback.

Make date relative to revision (if possible)

This way if you attempt to incrementally build an older revision
then the full rebuild will be relative to the older revision
instead of being relative to the present.

Add `extraFetchGitArgs` option

This in particular comes in handy if you want to specify
`ref = "main";` to ensure that the older build comes from
the `main` branch of your repository.
9999years pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Jun 8, 2023
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.

---

Lower required version

… so that it works against the upstream PR

I'll change the required version to an official release
if the PR is merged.

---

s/pkgs/pkg/g

… as caught by @cdepillabout

Co-authored-by: Dennis Gosnell <[email protected]>

---

Skip the use of `tar`

We can store the `dist` directory decompressed, which
speeds up the dist export/import

This potentially requires more disk space *but* by storing
the files unpacked it may actually improve disk utilization
in some cases if `auto-optimise-store` is enabled by
permitting deduplication of `dist` files.

---

Add an `installDist` phase

… which is disabled by default

The motivation for this is to bring the behavior of
`enableSeparateDistOutput` more in line with the other
options where it doesn't change *whether* or not something
is exported, but rather *where* it is exported.

Now `installDist` controls whether or not the `dist`
directory is exported.

Based on this discussion:

https://github.com/NixOS/nixpkgs/pull/203499/files#r1034150076

---

Document `interval` argument

… as suggested by @cdepillabout

---

s/for use for/for use with/

… based on feedback from @MaxGabriel

---

Move `installDistPhase` to `postPhases`

There are two reasons for doing this:

- We can get rid of the hack to remove the dist output from the outputs

- We can ensure that any changes that happen in the install phase are
  correctly reflected in the `dist` export

---

Disable dylib workaround for incremental build

---

Improve correctness of `incremental` function

Typically we don't want to just roll back the source code that is the
input for the Haskell package because the dependencies for the package
may have changed

In other words, if you roll back the source code for the top-level
package without also rolling back the Nix-supplied dependencies
for that build then you run the risk of an unexpected build failure
(due to an older version of the Haskell package being built against
a newer version of the Nix-supplied dependencies).

What you actually want to do is to roll back the entire repository
(i.e. the Haskell source code and the supporting Nix code) to ensure
that the Haskell source code and Nix code stay in sync.

This more generalized rollback complicates the UX for the
`incremental` function.  I did my best to try to streamline
the UX so that the user just needs to specify how to locate the
matching (older) package after a rollback.

---

Make date relative to revision (if possible)

This way if you attempt to incrementally build an older revision
then the full rebuild will be relative to the older revision
instead of being relative to the present.

---

Add `extraFetchGitArgs` option

This in particular comes in handy if you want to specify
`ref = "main";` to ensure that the older build comes from
the `main` branch of your repository.
@Gabriella439
Copy link
Contributor Author

Gabriella439 commented Feb 28, 2024

We ran into another situation at work where this support would have been really helpful:

We currently use Nix to build a snapshot of our database with all migrations up to that point having been already run. However, this snapshot build is somewhat expensive and people are adding new migrations multiple times a day.

The ideal case would be if we could have this expensive build always "snap" to the last commit before the last UTC midnight so that way it would only be rebuilt once per day so that we'd get a lot of cache reuse from that one build (even as new migrations were still being merged that day). If we were to merge this PR we could implement that functionality using something like this truncate function powered by the extended builtins.fetchGit:

{ lib }:

# "Truncate" a `git` source repository to the nearest time interval (specified
# in seconds).  For example `truncate { duration = 60; src = …; }` will
# return the last commit before the last minute boundary.
{ interval
, src
, extraFetchGitArgs ? { }
}:

let
  srcAttributes =
    if lib.isAttrs src
    then src
    else { url = src; };

  url = srcAttributes.url or null;
  name = srcAttributes.name or null;
  submodules = srcAttributes.fetchSubmodules or null;

  arguments = {
    ${ if name == null then null else "name" } = name;
    ${ if url == null then null else "url" } = url;
    ${ if submodules == null then null else "submodules" } = submodules;
  };

  startingTime =
    if      srcAttributes ? rev
        &&  srcAttributes.rev != "0000000000000000000000000000000000000000"
    then
      let
        startingRepository = builtins.fetchGit (arguments // {
          inherit (srcAttributes) rev;
        });
      in
        startingRepository.lastModified
    else
      builtins.currentTime;

in
  builtins.fetchGit (arguments // {
    date = "${toString ((startingTime / interval) * interval)}";
  } // extraFetchGitArgs)

Then with that truncate function we would achieve the desired daily snapshot behavior like this:

lib.truncate {
  src =;
  interval = 24 * 60 * 60;
}

Now, this PR needs to be fixed to work against the latest changes on master, which I'm more than happy to do myself, but I don't want to do that work if upstream's not going to merge this so I'd like to ask them to reconsider if this can be merged.

Alternatively, if they know a better way to accomplish what we're trying to do I would welcome that, but "setting up a cron job that sets a tag or adds a commit every day to our git history" is not really an acceptable solution for the reasons I mentioned in #7362 (comment)

@Gabriella439
Copy link
Contributor Author

Gabriella439 commented Feb 28, 2024

I should probably also add one more thing that might not be obvious based on the previous discussion:

  • If you have a cron job that updates a tag that doesn't work because then stale branches (ones with a merge base preceding the tag) will pull in migrations that they shouldn't have on their branch

    In other words, by depending on a moving tag you're depending on an unlocked (impure) source input with all of the issues that entails.

  • You can fix that impurity by adding junk commits daily to the trunk branch (instead of using tags) but that comes with the obvious downside of junk commits

    … and also still doesn't completely work for long-lived branches (with commits spanning multiple days), since those branches will be missing branch-local commits to update their snapshots. The only way to get a junk-commit-based solution that has parity with the date argument to builtins.fetchGit is to add daily junk commits to all development branches (which will generate merge conflicts!).

The date argument gives the best of both worlds because it preserves purity but still doesn't require setting up machinery to add noisy commits to every long-lived development branch.

wavewave pushed a commit to MercuryTechnologies/nixpkgs that referenced this pull request Mar 12, 2024
This adds a new `incremental` utility for Haskell CI that
supports incremental builds based on the approach outlined
in this blog post:

https://harry.garrood.me/blog/easy-incremental-haskell-ci-builds-with-ghc-9.4/

The basic idea is that instead of Nix doing a full build
for a package, we split every build into two builds:

- A full build at an older point in time

  e.g. a daily or weekly time boundary

- An incremental build relative to the last full build

  This incremental build reuses the build products left
  over from the most recent full build.

In order to do this, though, we need a way to "snap" a
package's `git` source input to an earlier point in time
(e.g. a daily boundary or weekly boundary).  This would
allow multiple incremental builds to share the same
full rebuild if they snap to the same time boundary.

The approach I went with to make that possible was to
extend Nix's `builtins.fetchGit` to support a new `date`
argument and you can find the corresponding PR for that
here:

NixOS/nix#7362

That is why the `incremental` utility added here requires
a sufficiently new version of Nix (one that would incorporate
that change, presuming it is merged).

This also requires GHC 9.4 or newer in order to pick up a fix
to GHC's change detection logic, as described in  more detail
in the above blog post.

However, if you satisfy those requirements then this works
exactly the way you'd expect: all of the incremental builds
only have to build the diff since the last time boundary.
Moreover, if CI caches the full build then developers can
also run `nix build` locally and only have to build the diff, too.

---

Lower required version

… so that it works against the upstream PR

I'll change the required version to an official release
if the PR is merged.

---

s/pkgs/pkg/g

… as caught by @cdepillabout

Co-authored-by: Dennis Gosnell <[email protected]>

---

Skip the use of `tar`

We can store the `dist` directory decompressed, which
speeds up the dist export/import

This potentially requires more disk space *but* by storing
the files unpacked it may actually improve disk utilization
in some cases if `auto-optimise-store` is enabled by
permitting deduplication of `dist` files.

---

Add an `installDist` phase

… which is disabled by default

The motivation for this is to bring the behavior of
`enableSeparateDistOutput` more in line with the other
options where it doesn't change *whether* or not something
is exported, but rather *where* it is exported.

Now `installDist` controls whether or not the `dist`
directory is exported.

Based on this discussion:

https://github.com/NixOS/nixpkgs/pull/203499/files#r1034150076

---

Document `interval` argument

… as suggested by @cdepillabout

---

s/for use for/for use with/

… based on feedback from @MaxGabriel

---

Move `installDistPhase` to `postPhases`

There are two reasons for doing this:

- We can get rid of the hack to remove the dist output from the outputs

- We can ensure that any changes that happen in the install phase are
  correctly reflected in the `dist` export

---

Disable dylib workaround for incremental build

---

Improve correctness of `incremental` function

Typically we don't want to just roll back the source code that is the
input for the Haskell package because the dependencies for the package
may have changed

In other words, if you roll back the source code for the top-level
package without also rolling back the Nix-supplied dependencies
for that build then you run the risk of an unexpected build failure
(due to an older version of the Haskell package being built against
a newer version of the Nix-supplied dependencies).

What you actually want to do is to roll back the entire repository
(i.e. the Haskell source code and the supporting Nix code) to ensure
that the Haskell source code and Nix code stay in sync.

This more generalized rollback complicates the UX for the
`incremental` function.  I did my best to try to streamline
the UX so that the user just needs to specify how to locate the
matching (older) package after a rollback.

---

Make date relative to revision (if possible)

This way if you attempt to incrementally build an older revision
then the full rebuild will be relative to the older revision
instead of being relative to the present.

---

Add `extraFetchGitArgs` option

This in particular comes in handy if you want to specify
`ref = "main";` to ensure that the older build comes from
the `main` branch of your repository.
@pwm
Copy link

pwm commented Mar 25, 2024

Incremental builds would be a game changer for every CI running Nix, saving time and resources. As discussed above, this PR does not add any more impurities that are not already there. I'm echoing Gabriella's request to please reconsider merging this.

@tomberek
Copy link
Contributor

tomberek commented Aug 17, 2024

I wonder if this would be easier to support if it only applies when you already have a rev:

builtins.fetchGit {
  url = "some-url";
  rev = "abb08192ed875ef73fa66029994aa2f6700befd0"
  date = "1 day before";
}

It does mean that (builtins.fetchGit {rev = some-rev; ...}).rev != some-rev, and then date is seen more like a deterministic modifier than as an identifier that changes.

The current PR uses ref to first obtain a rev, and then modify that to roll back time.

builtins.fetchGit {
  url = "some-url";
  ref = "SOMEBRANCH"
  date = "1 day before";
}

I leaning toward accepting either as-is, or via restricting the behavior to when rev is provided.

Timezones

Another random question: is the timestamp specification sensitive to things like timezones and locales? If the someone's location impacted this, it would be much harder to consider the date to be a deterministic modifier.

@tomberek
Copy link
Contributor

tomberek commented Aug 17, 2024

Hrm....

[tom@tframe:~/nix]$ ( export TZ=CET ; git rev-list --before 'noon' -1 HEAD )
b7d80d002f8a3885e9f98b5944e283c2b7f861e6

[tom@tframe:~/nix]$ ( export TZ=UTC ; git rev-list --before 'noon' -1 HEAD )
c45859864742dea6cacf84b83b577c3be0744bf1

[tom@tframe:~/nix]$ ( export TZ=WET ; git rev-list --before 'noon' -1 HEAD )
2ab93fd5fda3f61f6b1560db7da21a34dbd13b7d

If we do this, we need to either ensure a consistent TZ or not allow some of the special forms (https://github.com/git/git/blob/master/date.c#L1197-L1204). "yesterday" is just an offset, but "tea" is a specific time of day most conducive to hot beverages, very location dependent.

@Gabriella439
Copy link
Contributor Author

I'd also be okay with always requiring the --impure flag to use this feature

@Aleksanaa
Copy link
Member

Aleksanaa commented Aug 19, 2024

What about: warn if such forms are used in pure eval, and assume UTC; subject to local time in impure eval

No, such keywords should be forbidden in pure eval

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-08-21-nix-team-meeting-minutes-171/50950/1

@edolstra edolstra self-requested a review as a code owner November 12, 2024 19:47
@thomie
Copy link
Contributor

thomie commented Mar 13, 2025

What is the current status of this feature?

From the 2024-08-21 nix team meeting minutes :

What would be the right primitives for this?

Idea: perhaps fetchGit/fetchTree could support this in a pure way.

inputs.a = { type = "git"; ref = "main"; }  # lock: rev = ...
inputs.a-base = { type = "git"; rev = inputs.a.rev; roundDown = "1 day"; }

In this example, the roundDown attribute would implement the desired behavior and could be implemented

... which gives hope?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal fetching Networking with the outside (non-Nix) world, input locking
Projects
None yet
Development

Successfully merging this pull request may close these issues.