Closed
Description
Feature or enhancement
Proposal:
Counting single characters in a string is very useful. For instance calculating the GC content in a DNA sequence.
def gc_content(sequence: str) -> int:
upper_seq = sequence.upper()
a_count = upper_seq.count('A')
c_count = upper_seq.count('C')
g_count = upper_seq.count('G')
t_count = upper_seq.count('T')
# Unknown N bases should not influence the GC content, do not use len(sequence)
total = a_count + c_count + g_count + t_count
return (c_count + g_count) / total
Another example would be counting newline characters.
The current code counts one character at the time.
static inline Py_ssize_t
STRINGLIB(count_char)(const STRINGLIB_CHAR *s, Py_ssize_t n,
const STRINGLIB_CHAR p0, Py_ssize_t maxcount)
{
Py_ssize_t i, count = 0;
for (i = 0; i < n; i++) {
if (s[i] == p0) {
count++;
if (count == maxcount) {
return maxcount;
}
}
}
return count;
}
By providing the appropriate hints to the compiler, the function can be sped up significantly.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response