incr.comp.: Use ICH-based DepNodes in order to make them PODs

The current implementation of `DepNode` has a `DefId` or something more complex as its identifier. This has a few downsides:
- `DefId`s are bound to a specific compilation session. When the dependency graph of a previous session is loaded, all nodes in it have to be "re-traced", which means that their `DefPath` has to be mapped to a `DefId` in the current compilation session. The has some cost at runtime and complicates the implementation (because we have to generate and store the `DefPathDirectory`). 
- Re-tracing a `DepNode` can also fail because the item it refers to does not exist anymore in the current compilation session. This is not a conceptual problem but it is an annoyance that `DepNodes` from a previous dep-graph cannot easily be represented. At the moment this is solved by making `DepNode` generic over the type of identifier it uses (either `DefId` or `DefPath`) but that is a burden.
- Independently of `DefId`, some `DepNode` variants are expensive to copy because they contain vectors.

This is a proposal to use a simplified design that uses a generic, globally unique fingerprint as the disambiguator for `DepNode`:
```rust
enum DepNodeKind {
    Krate,
    Hir,
    HirBody,
    MetaData,
    ...
}

struct DepNode {
    kind: DepNodeKind,
    // 128+ bit hash value identifying the DepNode
    fingerprint: Fingerprint,
}
```

Since this is using a stable hash (like the ICH), a `DepNode` like this is valid across compilation sessions and does not need any re-tracing. It's also "plain old data" in the sense that it contains no pointers or anything else that is context dependent. As a consequence, it can be easily copied, shared between threads, and memory mapped, something that is not possible with the current design.

The fingerprint-based approach has a few potential downsides but all of them can be addressed adequately, I think:
1. A truly stable fingerprint/hash value is not trivial to compute.<br>**Solution:** We already have the whole ICH infrastructure available which can handle anything we throw at it.
2. There's a runtime cost to turning a `DefId` or other identifiers into a fingerprint.<br>**Solution:** In the vast majority of cases, it is really just one `DefId` that needs to be hashed. In this case we already have the hash available (as the `DefPathHash`) and access it via a simple array-lookup. Additionally, all `DepNode` are cached in the dependency graph once they are created and can be looked up via a 32-bit `DepNodeIndex` if the need arises.
3. Using a hash value introduces the risk of hash collisions.<br>**Solution:** This is already a risk for incremental compilation and is mitigated by using a high quality hash function with low enough collision probability. Risk can be adjusted by using fingerprints with more bits. 
4. A fingerprint is opaque while a `DefId` allows to reconstruct which item it points to. This is bad for debugging output.<br>**Solution:** This is can be mitigated by just constructing lookup tables for mapping a Fingerprint back to its ingredients if `-Zquery-dep-graph` is specified.

cc @nikomatsakis and @eddyb 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

incr.comp.: Use ICH-based DepNodes in order to make them PODs #42294

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

incr.comp.: Use ICH-based DepNodes in order to make them PODs #42294

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions