The paper discusses the challenges of feature extraction and selection in cybersecurity, emphasizing their importance for detecting threats through machine learning. It presents a method utilizing Apache Spark's PySpark to automate these tasks, addressing the complexities of heterogeneous data from diverse network sensors. By streamlining data processing, the research aims to facilitate real-time anomaly detection and enhance the efficiency of security analytics.