Skip to content

Introduce standard sharded SCIP index format #143

Closed
@varungandhi-src

Description

@varungandhi-src

Protobuf has a size limit of 2GB per message.

A single index for Chromium is about 6GB, triggering an error in scip-clang.

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/message_lite.cc:402] scip.Index exceeded maximum protobuf size of 2GB: 6138839817

The indexer needs to shard the data ahead-of-time and emit that instead.

Proposed sharded index format:

  • Directory containing one or more *.shard.scip files, each of which contains a scip.Index. All shards must have the same metadata.
  • The default name for the outer directory will be index.scip.

src-cli can be pointed to index.scip with -file (maybe we should rename this flag?). It will be responsible for compressing/tarring and uploading the index.

The behavior should be as-if:

  • The metadata field of a scip.Index was populated by the metadata field in some shard.
  • Any other fields of scip.Index (currently documents and external_symbols) in *.shard.scip files in the archive are processed in lexicographic ordering based on shard file names.

We should add documentation about this format in the README or in the scip.proto file.

This feature requires changes in:

Maybe we should also mention this feature addition in various CHANGELOGs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions