-
Notifications
You must be signed in to change notification settings - Fork 24.2k
TorchDynamo Performance DashBoard #93794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Compilation ProfileThe tables show the worst 50 models for different metrics Compilation Latencysee moredtype=float32, unit=seconds
Peak Memorysee moredtype=float32, unit=GB
Number of graphssee moredtype=float32, unit=graphs
|
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Metrics over timehuggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Metrics over timehuggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward and backward pass. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Performance graphs |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Previous report name: /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/comp_time_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803 No regressions found. Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_082_23_03_23_performance_amp_803 Accuracy regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_083_24_03_23_performance_amp_938 Commit hashespytorch commit: c757647 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+gitc757647 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Previous report name: /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/memory_over_time.png : bench_logs/passrate_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Performance speedup regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_083_24_03_23_performance_amp_938 Performance speedup regressions
Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_084_25_03_23_performance_amp_808 Commit hashespytorch commit: dc45ad7 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+gitdc45ad7 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Previous report name: /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/memory_over_time.png : bench_logs/passrate_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Performance speedup regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_084_25_03_23_performance_amp_808 Accuracy regressions
Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_085_26_03_23_performance_amp_382 Commit hashespytorch commit: 542fb0b TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git542fb0b Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Previous report name: /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/passrate_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Peak Memory Compression Ratio regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_085_26_03_23_performance_amp_382 Accuracy regressions
Performance speedup regressions
Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_086_27_03_23_performance_amp_689 Commit hashespytorch commit: 08c1d1a TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git08c1d1a Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Previous report name: /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/passrate_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Accuracy regressions
Performance speedup regressions
Compilation latency (sec) regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_086_27_03_23_performance_amp_689 Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_087_28_03_23_performance_amp_574 Commit hashespytorch commit: f754be8 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+gitf754be8 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Previous report name: /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/geomean_over_time.png : bench_logs/comp_time_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 No regressions found. Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Performance speedup regressions
Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_087_28_03_23_performance_amp_574 Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_088_29_03_23_performance_amp_652 Commit hashespytorch commit: 7fc100a TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git7fc100a Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Previous report name: /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/geomean_over_time.png : bench_logs/comp_time_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Performance speedup regressions
Compilation latency (sec) regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Performance speedup regressions
Compilation latency (sec) regressions
Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_088_29_03_23_performance_amp_652 Performance speedup regressions
Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_089_30_03_23_performance_amp_667 Commit hashespytorch commit: dc2b7aa TorchDynamo config flagsTorch versiontorch: 2.1.0a0+gitdc2b7aa Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Previous report name: /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/comp_time_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Compilation latency (sec) regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Compilation latency (sec) regressions
Peak Memory Compression Ratio regressions
Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_089_30_03_23_performance_amp_667 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_090_31_03_23_performance_amp_893 Commit hashespytorch commit: 5df59f9 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git5df59f9 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Previous report name: /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/passrate_over_time.png : bench_logs/comp_time_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Performance speedup regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_090_31_03_23_performance_amp_893 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_091_01_04_23_performance_amp_829 Commit hashespytorch commit: 92b4620 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git92b4620 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Influenced by pytorch/pytorch#93794. This is the initial version of what the dashboard looks like https://torchci-git-fork-huydhn-add-compilers-bench-74abf8-fbopensource.vercel.app/benchmark/compilers Related issue #3783
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Previous report name: /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/geomean_over_time.png : bench_logs/passrate_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Accuracy regressions
Performance speedup regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Performance speedup regressions
Peak Memory Compression Ratio regressions
Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_091_01_04_23_performance_amp_829 Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_092_02_04_23_performance_amp_540 Commit hashespytorch commit: 5d62d12 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git5d62d12 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Previous report name: /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/passrate_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Performance speedup regressions
Compilation latency (sec) regressions
Peak Memory Compression Ratio regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Accuracy regressions
Performance speedup regressions
Peak Memory Compression Ratio regressions
Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_092_02_04_23_performance_amp_540 Compilation latency (sec) regressions
Peak Memory Compression Ratio regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_093_03_04_23_performance_amp_684 Commit hashespytorch commit: 4431509 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git4431509 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Previous report name: /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/passrate_over_time.png : bench_logs/comp_time_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Performance speedup regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_093_03_04_23_performance_amp_684 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_094_04_04_23_performance_amp_473 Commit hashespytorch commit: 6887333 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git6887333 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Previous report name: /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/passrate_over_time.png : bench_logs/comp_time_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Compilation latency (sec) regressions
Peak Memory Compression Ratio regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Performance speedup regressions
Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_094_04_04_23_performance_amp_473 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_095_05_04_23_performance_amp_373 Commit hashespytorch commit: 1189015 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git1189015 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Previous report name: /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/comp_time_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Peak Memory Compression Ratio regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Peak Memory Compression Ratio regressions
Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_095_05_04_23_performance_amp_373 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_096_06_04_23_performance_amp_914 Commit hashespytorch commit: 2161be0 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git2161be0 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Previous report name: /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/memory_over_time.png : bench_logs/passrate_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Performance speedup regressions
Compilation latency (sec) regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Performance speedup regressions
Compilation latency (sec) regressions
Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_096_06_04_23_performance_amp_914 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_097_07_04_23_performance_amp_979 Commit hashespytorch commit: c68a94c TorchDynamo config flagsTorch versiontorch: 2.1.0a0+gitc68a94c Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Previous report name: /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/passrate_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 No regressions found. Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Compilation latency (sec) regressions
Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_097_07_04_23_performance_amp_979 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_098_08_04_23_performance_amp_214 Commit hashespytorch commit: 54b1684 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git54b1684 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Previous report name: /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/comp_time_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 No regressions found. Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_098_08_04_23_performance_amp_214 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_099_09_04_23_performance_amp_108 Commit hashespytorch commit: 5842444 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git5842444 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Previous report name: /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/geomean_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Performance speedup regressions
Compilation latency (sec) regressions
Peak Memory Compression Ratio regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Accuracy regressions
Performance speedup regressions
Peak Memory Compression Ratio regressions
Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_099_09_04_23_performance_amp_108 Compilation latency (sec) regressions
Peak Memory Compression Ratio regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_100_10_04_23_performance_amp_531 Commit hashespytorch commit: ab385bd TorchDynamo config flagsTorch versiontorch: 2.1.0a0+gitab385bd Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Previous report name: /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/comp_time_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Performance speedup regressions
Peak Memory Compression Ratio regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Performance speedup regressions
Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_100_10_04_23_performance_amp_531 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_101_11_04_23_performance_amp_766 Commit hashespytorch commit: 9c5473b TorchDynamo config flagstorch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False Torch versiontorch: 2.1.0a0+git9c5473b Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Previous report name: /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/comp_time_over_time.png : bench_logs/memory_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Performance speedup regressions
Compilation latency (sec) regressions
Regressions for huggingfaceCurrent report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_101_11_04_23_performance_amp_766 Compilation latency (sec) regressions
torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_102_12_04_23_performance_amp_274 Commit hashespytorch commit: 46a31e9 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git46a31e9 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
Performance Dashboard for amp precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
Summary Statistics Diffsee moreFor each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.Current report name: /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153 Previous report name: /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Passrate diff
Geometric mean speedup diff
Warningssee moreWe flag models where:
Accuracy warnings
Performance speedup warnings
Compilation latency (sec) warnings
Peak Memory Compression Ratio warnings
Metrics over timesee morebench_logs/geomean_over_time.png : bench_logs/comp_time_over_time.png : Recent Regressionssee moreFor each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).Regressions for torchbenchCurrent report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153 Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153 Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Performance speedup regressions
Peak Memory Compression Ratio regressions
Regressions for huggingfaceCurrent report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153 Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153 Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 No regressions found. Regressions for timm_modelsCurrent report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153 Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_103_13_04_23_performance_amp_153 Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/cron_logs/day_102_12_04_23_performance_amp_274 No regressions found. torchbench suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with amp precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Performance graphsBuild Summarysee moreRun nameday_103_13_04_23_performance_amp_153 Commit hashespytorch commit: 979c5b4 TorchDynamo config flagsTorch versiontorch: 2.1.0a0+git979c5b4 Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8500 |
@williamwen42 , may I know if the data is training performance or inference performance? |
This was for training. |
The new dashboard is at https://hud.pytorch.org/benchmark/compilers - Closing the issue. |
Quick question, was nvprims_nvfuser removed from backends ? It's not in print(torchdynamo.list_backends()) nor in the dashboard above, but it is in the documentation. |
@andreigh yes it was, it was also removed from the docs here https://pytorch.org/docs/main/torch.compiler.html which you should go to for the most up to date info short rationale was discussed here https://dev-discuss.pytorch.org/t/question-about-nvfuser-being-removed/1453/2?u=msaroufim |
@williamwen42 |
@yinrun this is a legacy dashboard - current performance metrics can be seen at https://hud.pytorch.org/benchmark/compilers. The entrypoints can be found at |
Dashboard to track the performance of different backends.
cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire
The text was updated successfully, but these errors were encountered: