Skip to content

Use "prefetch" CPU instructions during the marking phase of the GC #129201

Closed
@nascheme

Description

@nascheme

Feature or enhancement

Proposal:

This change is partially inspired by a similar change made to the OCaml GC. Sam wrote a prototype implementation of the idea and that seemed to show promise. Now that we have a "mark alive" phase in the free-threaded GC, it is easier to add the prefetch buffer. Doing it only for the marking phase would seem to provide most of the benefit for the minimal amount of code complexity.

It is expected that using "prefetch" will only provide a benefit when the working set of objects exceeds the size of the CPU cache. If that's not the case, the prefetch logic should not (much) hurt performance. There would be a small increase in the code complexity for traversing the object graph (to selectively use the prefetch buffer or use the stack). However, on small object graphs, the time spent in the GC is also small.

Note this change is proposed for the free-threaded version of the cyclic GC. It might be possible to use prefetching in the default build GC but the design would need to be fairly different due to the next/prev GC linked lists. A separate issue should be created if someone wants to try to implement that optimization.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions