Skip to content

The count method on strings, bytes, bytearray etc. can be significantly faster #120397

Closed
@rhpvorderman

Description

@rhpvorderman

Feature or enhancement

Proposal:

Counting single characters in a string is very useful. For instance calculating the GC content in a DNA sequence.

def gc_content(sequence: str) -> int:
    upper_seq = sequence.upper()
    a_count = upper_seq.count('A')
    c_count = upper_seq.count('C')
    g_count = upper_seq.count('G')
    t_count = upper_seq.count('T')
    # Unknown N bases should not influence the GC content, do not use len(sequence)
    total = a_count + c_count + g_count + t_count 
    return (c_count + g_count) / total

Another example would be counting newline characters.

The current code counts one character at the time.

static inline Py_ssize_t
STRINGLIB(count_char)(const STRINGLIB_CHAR *s, Py_ssize_t n,
                      const STRINGLIB_CHAR p0, Py_ssize_t maxcount)
{
    Py_ssize_t i, count = 0;
    for (i = 0; i < n; i++) {
        if (s[i] == p0) {
            count++;
            if (count == maxcount) {
                return maxcount;
            }
        }
    }
    return count;
}

By providing the appropriate hints to the compiler, the function can be sped up significantly.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions