The document discusses how data volume impacts Spark-based data analytics on scale-up servers, highlighting performance bottlenecks and the diminishing returns of high core count executors. Key findings indicate significant degradation in performance with increased data volumes due to garbage collection and file I/O, alongside insights into CPU utilization and micro-architectural performance variations. Future directions for improvement are also suggested, including NUMA-aware scheduling and memory architecture optimizations.
Related topics: