This document summarizes a paper that proposes using parallel programming techniques to improve the performance of the discrete cosine transform (DCT) algorithm. It describes implementing both thread-level parallelism by distributing image blocks across multiple processor cores, and vector-level parallelism by performing SIMD operations within each core using AVX instructions. The proposed methodology uses Cilk Plus to enable parallelization at both the thread and vector levels. It is estimated that this multi-level parallel approach could theoretically provide a speedup of up to 32 times compared to a serial scalar implementation.