So for either column, I would like to get the maximum ranges of contiguous valid data, e.g. the lines (indexes or actual lines) range 6 to 10 for column 1.
Or for missings 2 to 6 and 8 to 11 for column 2.
I started to think about running through the lines etc. but I guess that there’s a more native way to do that.
function max_valid_run(v)
max_len = 0
curr_len = 0
for x in v
if x !== missing && !isnan(x)
curr_len += 1
max_len = max(max_len, curr_len)
else
curr_len = 0
end
end
return max_len
end
mapcols(col -> max_valid_run(col), df)
Probably is a more efficient way of checking the columns but have to pay some care as isnan(missing) gives missing as opposed to false so !isnan(x) && x !== missing would throw a TypeError