The document presents a study on a data extraction algorithm designed to estimate software project durations using historical data from GitHub repositories, specifically focusing on WordPress projects. It emphasizes the challenges of obtaining reliable data for estimation models, proposing an approach that categorizes contributors based on their activity levels in relation to the release duration. The results indicate that the proposed algorithm effectively improves the ability to build accurate duration estimation models using open-source software project data.