This document presents an algorithm to extract data from open source software project repositories on GitHub for building duration estimation models. The algorithm extracts data on contributors, commits, lines of code added and removed, and active days for each contributor within a release period. This data is then used to build linear regression models to estimate project duration based on the number of commits by contributor type (full-time, part-time, occasional). The algorithm is tested on data extracted from 21 releases of the WordPress project hosted on GitHub.